Self-Information Loss Compensation Learning for Machine-Generated Text Detection

Automatic Text Summaration of COVID-19 Scientific Research Topics Using Pre-trained Model from HuggingFace®

10.36227/techrxiv.17693702.v1 ◽

2021 ◽

Author(s):

Sakdipat Ontoum ◽

Jonathan H. Chan

Keyword(s):

Language Processing ◽

Relevant Information ◽

Original Text ◽

Learning Approaches ◽

Text Documents ◽

Written Text ◽

Automatic Text Summarization ◽

Word Clouds ◽

Readability Test ◽

Automatic Text

By identifying and extracting relevant information from articles, automated text summarizing helps the scientific and medical sectors. Automatic text summarization is a way of compressing text documents so that users may find important information in the original text in less time. We will first review some new works in the field of summarizing that use deep learning approaches, and then we will explain the "COVID-19" summarization research papers. The ease with which a reader can grasp written text is referred to as the readability test. The substance of text determines its readability in natural language processing. We constructed word clouds using the abstract's most commonly used text. By looking at those three measurements, we can determine the mean of "ROUGE-1", "ROUGE-2", and "ROUGE-L". As a consequence, "Distilbart-mnli-12-6" and "GPT2-large" are outperform than other. <br>

Download Full-text

PAAD: POLITICAL ARABIC ARTICLES DATASET FOR AUTOMATIC TEXT CATEGORIZATION

Iraqi Journal for Computers and Informatics ◽

10.25195/ijci.v46i1.246 ◽

2020 ◽

Vol 46 (1) ◽

pp. 1-10

Author(s):

Dhafar Hamed Abd ◽

Ahmed T. Sadiq ◽

Ayad R. Abbas

Keyword(s):

Computational Linguistics ◽

Language Processing ◽

Text Classification ◽

Text Categorization ◽

Political Orientation ◽

Huge Amount ◽

Textual Data ◽

Automatic Text ◽

Excel File ◽

Modern Standard

Now day’s text Classification and Sentiment analysis is considered as one of the popular Natural Language Processing (NLP) tasks. This kind of technique plays significant role in human activities and has impact on the daily behaviours. Each article in different fields such as politics and business represent different opinions according to the writer tendency. A huge amount of data will be acquired through that differentiation. The capability to manage the political orientation of an online article automatically. Therefore, there is no corpus for political categorization was directed towards this task in Arabic, due to the lack of rich representative resources for training an Arabic text classifier. However, we introduce political Arabic articles dataset (PAAD) of textual data collected from newspapers, social network, general forum and ideology website. The dataset is 206 articles distributed into three categories as (Reform, Conservative and Revolutionary) that we offer to the research community on Arabic computational linguistics. We anticipate that this dataset would make a great aid for a variety of NLP tasks on Modern Standard Arabic, political text classification purposes. We present the data in raw form and excel file. Excel file will be in four types such as V1 raw data, V2 preprocessing, V3 root stemming and V4 light stemming.

Download Full-text

Automatic Text Summaration of COVID-19 Scientific Research Topics Using Pre-trained Model from HuggingFace®

10.36227/techrxiv.17693702 ◽

2021 ◽

Author(s):

Sakdipat Ontoum ◽

Jonathan H. Chan

Keyword(s):

Language Processing ◽

Relevant Information ◽

Original Text ◽

Learning Approaches ◽

Text Documents ◽

Written Text ◽

Automatic Text Summarization ◽

Word Clouds ◽

Readability Test ◽

Automatic Text

By identifying and extracting relevant information from articles, automated text summarizing helps the scientific and medical sectors. Automatic text summarization is a way of compressing text documents so that users may find important information in the original text in less time. We will first review some new works in the field of summarizing that use deep learning approaches, and then we will explain the "COVID-19" summarization research papers. The ease with which a reader can grasp written text is referred to as the readability test. The substance of text determines its readability in natural language processing. We constructed word clouds using the abstract's most commonly used text. By looking at those three measurements, we can determine the mean of "ROUGE-1", "ROUGE-2", and "ROUGE-L". As a consequence, "Distilbart-mnli-12-6" and "GPT2-large" are outperform than other. <br>

Download Full-text

Deep Learning Techniques on Text Classification Using Natural Language Processing (NLP) In Social Healthcare Network: A Comprehensive Survey

2021 3rd International Conference on Signal Processing and Communication (ICPSC) ◽

10.1109/icspc51351.2021.9451752 ◽

2021 ◽

Author(s):

PM. Lavanya ◽

E. Sasikala

Keyword(s):

Deep Learning ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Text Classification ◽

Healthcare Network ◽

Learning Techniques ◽

Comprehensive Survey

Download Full-text

Automatic text detection in complex color image

Proceedings. International Conference on Machine Learning and Cybernetics ◽

10.1109/icmlc.2002.1167384 ◽

2003 ◽

Cited By ~ 7

Author(s):

Jiang Wu ◽

Shao-Lin Qu ◽

Qing Zhuo ◽

Wen-Yuan Wang

Keyword(s):

Color Image ◽

Text Detection ◽

Automatic Text

Download Full-text

Text Classification by using Natural Language Processing

Journal of Physics Conference Series ◽

10.1088/1742-6596/1802/4/042010 ◽

2021 ◽

Vol 1802 (4) ◽

pp. 042010

Author(s):

Peiyang Yu ◽

Victor Y. Cui ◽

Jiaxin Guan

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Text Classification

Download Full-text

Natural Language Processing Service Based on Stroke-Level Convolutional Networks for Chinese Text Classification

2017 IEEE International Conference on Web Services (ICWS) ◽

10.1109/icws.2017.46 ◽

2017 ◽

Cited By ~ 5

Author(s):

Hang Zhuang ◽

Chao Wang ◽

Changlong Li ◽

Qingfeng Wang ◽

Xuehai Zhou

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Chinese Text ◽

Text Classification ◽

Convolutional Networks ◽

Chinese Text Classification ◽

Processing Service

Download Full-text

Text Simplification

The Oxford Handbook of Computational Linguistics 2nd edition ◽

10.1093/oxfordhb/9780199573691.013.52 ◽

2018 ◽

Cited By ~ 1

Author(s):

Horacio Saggion

Keyword(s):

Language Processing ◽

Language Resources ◽

The Past ◽

Text Simplification ◽

Text Readability ◽

Target User ◽

Evaluation Approaches ◽

Linguistic Impairment ◽

Automatic Text ◽

The Web

Over the past decades, information has been made available to a broad audience thanks to the availability of texts on the Web. However, understanding the wealth of information contained in texts can pose difficulties for a number of people including those with poor literacy, cognitive or linguistic impairment, or those with limited knowledge of the language of the text. Text simplification was initially conceived as a technology to simplify sentences so that they would be easier to process by natural-language processing components such as parsers. However, nowadays automatic text simplification is conceived as a technology to transform a text into an equivalent which is easier to read and to understand by a target user. Text simplification concerns both the modification of the vocabulary of the text (lexical simplification) and the modification of the structure of the sentences (syntactic simplification). In this chapter, after briefly introducing the topic of text readability, we give an overview of past and recent methods to address these two problems. We also describe simplification applications and full systems also outline language resources and evaluation approaches.

Download Full-text

Medical Text Classification Using Hybrid Deep Learning Models with Multihead Attention

Computational Intelligence and Neuroscience ◽

10.1155/2021/9425655 ◽

2021 ◽

Vol 2021 ◽

pp. 1-16

Author(s):

Sunil Kumar Prabhakar ◽

Dong-Ok Won

Keyword(s):

Deep Learning ◽

Language Processing ◽

Text Classification ◽

Patient Information ◽

Classification Accuracy ◽

Learning Model ◽

Training Data ◽

Machine Learning Techniques ◽

Medical Text ◽

Deep Learning Model

To unlock information present in clinical description, automatic medical text classification is highly useful in the arena of natural language processing (NLP). For medical text classification tasks, machine learning techniques seem to be quite effective; however, it requires extensive effort from human side, so that the labeled training data can be created. For clinical and translational research, a huge quantity of detailed patient information, such as disease status, lab tests, medication history, side effects, and treatment outcomes, has been collected in an electronic format, and it serves as a valuable data source for further analysis. Therefore, a huge quantity of detailed patient information is present in the medical text, and it is quite a huge challenge to process it efficiently. In this work, a medical text classification paradigm, using two novel deep learning architectures, is proposed to mitigate the human efforts. The first approach is that a quad channel hybrid long short-term memory (QC-LSTM) deep learning model is implemented utilizing four channels, and the second approach is that a hybrid bidirectional gated recurrent unit (BiGRU) deep learning model with multihead attention is developed and implemented successfully. The proposed methodology is validated on two medical text datasets, and a comprehensive analysis is conducted. The best results in terms of classification accuracy of 96.72% is obtained with the proposed QC-LSTM deep learning model, and a classification accuracy of 95.76% is obtained with the proposed hybrid BiGRU deep learning model.

Download Full-text

Language Semantics Interpretation with an Interaction-Based Recurrent Neural Network

Machine Learning and Knowledge Extraction ◽

10.3390/make3040046 ◽

2021 ◽

Vol 3 (4) ◽

pp. 922-945

Author(s):

Shaw-Hwa Lo ◽

Yiqiao Yin

Keyword(s):

Neural Network ◽

Neural Networks ◽

Language Processing ◽

Text Classification ◽

Search Algorithm ◽

Greedy Search ◽

Text Documents ◽

Engineering Technique ◽

Language Semantics ◽

Sequential Models

Text classification is a fundamental language task in Natural Language Processing. A variety of sequential models are capable of making good predictions, yet there is a lack of connection between language semantics and prediction results. This paper proposes a novel influence score (I-score), a greedy search algorithm, called Backward Dropping Algorithm (BDA), and a novel feature engineering technique called the “dagger technique”. First, the paper proposes to use the novel influence score (I-score) to detect and search for the important language semantics in text documents that are useful for making good predictions in text classification tasks. Next, a greedy search algorithm, called the Backward Dropping Algorithm, is proposed to handle long-term dependencies in the dataset. Moreover, the paper proposes a novel engineering technique called the “dagger technique” that fully preserves the relationship between the explanatory variable and the response variable. The proposed techniques can be further generalized into any feed-forward Artificial Neural Networks (ANNs) and Convolutional Neural Networks (CNNs), and any neural network. A real-world application on the Internet Movie Database (IMDB) is used and the proposed methods are applied to improve prediction performance with an 81% error reduction compared to other popular peers if I-score and “dagger technique” are not implemented.

Download Full-text