A Sentence-Level Joint Relation Classification Model Based on Reinforcement Learning

Computational Intelligence and Neuroscience ◽

10.1155/2021/5557184 ◽

2021 ◽

Vol 2021 ◽

pp. 1-10

Author(s):

Zhen Liu ◽

XiaoQiang Di ◽

Wei Song ◽

WeiWu Ren

Keyword(s):

Reinforcement Learning ◽

Language Processing ◽

Semantic Processing ◽

Large Scale ◽

Short Term Memory ◽

Attention Mechanism ◽

Training Data ◽

Classification Model ◽

Sentence Level ◽

Relation Classification

Relation classification is an important semantic processing task in the field of natural language processing (NLP). Data sources generally adopt remote monitoring strategies to automatically generate large-scale training data, which inevitably causes label noise problems. At the same time, another challenge is that important information can appear at any place in the sentence. This paper presents a sentence-level joint relation classification model. The model has two modules: a reinforcement learning (RL) agent and a joint network model. In particular, we combine bidirectional long short-term memory (Bi-LSTM) and attention mechanism as a joint model to process the text features of sentences and classify the relation between two entities. At the same time, we introduce an attention mechanism to discover hidden information in sentences. The joint training of the two modules solves the noise problem in relation extraction, sentence-level information extraction, and relation classification. Experimental results demonstrate that the model can effectively deal with data noise and achieve better relation classification performance at the sentence level.

Download Full-text

Attention-Based LSTM with Filter Mechanism for Entity Relation Classification

Symmetry ◽

10.3390/sym12101729 ◽

2020 ◽

Vol 12 (10) ◽

pp. 1729 ◽

Cited By ~ 1

Author(s):

Yanliang Jin ◽

Dijia Wu ◽

Weisi Guo

Keyword(s):

Language Processing ◽

Short Term Memory ◽

Research Area ◽

Attention Mechanism ◽

Key Word ◽

Previous State ◽

Important Research Area ◽

Lstm Network ◽

Relation Classification ◽

Filter Mechanism

Relation classification is an important research area in the field of natural language processing (NLP), which aims to recognize the relationship between two tagged entities in a sentence. The noise caused by irrelevant words and the word distance between the tagged entities may affect the relation classification accuracy. In this paper, we present a novel model multi-head attention long short term memory (LSTM) network with filter mechanism (MALNet) to extract the text features and classify the relation of two entities in a sentence. In particular, we combine LSTM with attention mechanism to obtain the shallow local information and introduce a filter layer based on attention mechanism to strength the available information. Besides, we design a semantic rule for marking the key word between the target words and construct a key word layer to extract its semantic information. We evaluated the performance of our model on SemEval-2010 Task8 dataset and KBP-37 dataset. We achieved an F1-score of 86.3% on SemEval-2010 Task8 dataset and F1-score of 61.4% on KBP-37 dataset, which shows that our method is superior to the previous state-of-the-art methods.

Download Full-text

Towards Accurate and Efficient Chinese Part-of-Speech Tagging

Computational Linguistics ◽

10.1162/coli_a_00253 ◽

2016 ◽

Vol 42 (3) ◽

pp. 391-419 ◽

Cited By ~ 4

Author(s):

Weiwei Sun ◽

Xiaojun Wan

Keyword(s):

Hybrid Systems ◽

Language Processing ◽

Large Scale ◽

Unlabeled Data ◽

Training Data ◽

Test Time ◽

System Combination ◽

Pos Tagging ◽

Hybrid Approaches ◽

Lexical Relations

From the perspective of structural linguistics, we explore paradigmatic and syntagmatic lexical relations for Chinese POS tagging, an important and challenging task for Chinese language processing. Paradigmatic lexical relations are explicitly captured by word clustering on large-scale unlabeled data and are used to design new features to enhance a discriminative tagger. Syntagmatic lexical relations are implicitly captured by syntactic parsing in the constituency formalism, and are utilized via system combination. Experiments on the Penn Chinese Treebank demonstrate the importance of both paradigmatic and syntagmatic relations. Our linguistically motivated, hybrid approaches yield a relative error reduction of 18% in total over state-of-the-art baselines. Despite the effectiveness to boost accuracy, computationally expensive parsers make hybrid systems inappropriate for many realistic NLP applications. In this article, we are also concerned with improving tagging efficiency at test time. In particular, we explore unlabeled data to transfer the predictive power of hybrid models to simple sequence models. Specifically, hybrid systems are utilized to create large-scale pseudo training data for cheap models. Experimental results illustrate that the re-compiled models not only achieve high accuracy with respect to per token classification, but also serve as a front-end to a parser well.

Download Full-text

Pseudotext Injection and Advance Filtering of Low-Resource Corpus for Neural Machine Translation

Computational Intelligence and Neuroscience ◽

10.1155/2021/6682385 ◽

2021 ◽

Vol 2021 ◽

pp. 1-10

Author(s):

Michael Adjeisah ◽

Guohua Liu ◽

Douglas Omwenga Nyabuga ◽

Richard Nuetey Nortey ◽

Jinling Song

Keyword(s):

Machine Translation ◽

Language Processing ◽

Training Data ◽

Target Language ◽

Similarity Metrics ◽

Mahalanobis Distances ◽

Parallel Corpora ◽

Parallel Corpus ◽

Low Resource ◽

Sentence Level

Scaling natural language processing (NLP) to low-resourced languages to improve machine translation (MT) performance remains enigmatic. This research contributes to the domain on a low-resource English-Twi translation based on filtered synthetic-parallel corpora. It is often perplexing to learn and understand what a good-quality corpus looks like in low-resource conditions, mainly where the target corpus is the only sample text of the parallel language. To improve the MT performance in such low-resource language pairs, we propose to expand the training data by injecting synthetic-parallel corpus obtained by translating a monolingual corpus from the target language based on bootstrapping with different parameter settings. Furthermore, we performed unsupervised measurements on each sentence pair engaging squared Mahalanobis distances, a filtering technique that predicts sentence parallelism. Additionally, we extensively use three different sentence-level similarity metrics after round-trip translation. Experimental results on a diverse amount of available parallel corpus demonstrate that injecting pseudoparallel corpus and extensive filtering with sentence-level similarity metrics significantly improves the original out-of-the-box MT systems for low-resource language pairs. Compared with existing improvements on the same original framework under the same structure, our approach exhibits tremendous developments in BLEU and TER scores.

Download Full-text

Chinese Text Classification Model Based on Deep Learning

Future Internet ◽

10.3390/fi10110113 ◽

2018 ◽

Vol 10 (11) ◽

pp. 113 ◽

Cited By ~ 17

Author(s):

Yue Li ◽

Xutao Wang ◽

Pengjian Xu

Keyword(s):

Neural Network ◽

Deep Learning ◽

Language Processing ◽

Chinese Text ◽

Text Classification ◽

Short Term Memory ◽

Classification Model ◽

Short Term ◽

Term Memory ◽

Long Short Term Memory

Text classification is of importance in natural language processing, as the massive text information containing huge amounts of value needs to be classified into different categories for further use. In order to better classify text, our paper tries to build a deep learning model which achieves better classification results in Chinese text than those of other researchers’ models. After comparing different methods, long short-term memory (LSTM) and convolutional neural network (CNN) methods were selected as deep learning methods to classify Chinese text. LSTM is a special kind of recurrent neural network (RNN), which is capable of processing serialized information through its recurrent structure. By contrast, CNN has shown its ability to extract features from visual imagery. Therefore, two layers of LSTM and one layer of CNN were integrated to our new model: the BLSTM-C model (BLSTM stands for bi-directional long short-term memory while C stands for CNN.) LSTM was responsible for obtaining a sequence output based on past and future contexts, which was then input to the convolutional layer for extracting features. In our experiments, the proposed BLSTM-C model was evaluated in several ways. In the results, the model exhibited remarkable performance in text classification, especially in Chinese texts.

Download Full-text

Attentional Reinforcement Learning in the Brain

New Generation Computing ◽

10.1007/s00354-019-00081-z ◽

2020 ◽

Vol 38 (1) ◽

pp. 49-64 ◽

Cited By ~ 2

Author(s):

Hiroshi Yamakawa

Keyword(s):

Deep Learning ◽

Reinforcement Learning ◽

Basal Ganglia ◽

Language Processing ◽

Information Source ◽

Attention Mechanism ◽

Transmission Route ◽

Thalamic Relay ◽

Signal Changes ◽

The Brain

AbstractRecently, attention mechanisms have significantly boosted the performance of natural language processing using deep learning. An attention mechanism can select the information to be used, such as by conducting a dictionary lookup; this information is then used, for example, to select the next utterance word in a sentence. In neuroscience, the basis of the function of sequentially selecting words is considered to be the cortico-basal ganglia-thalamocortical loop. Here, we first show that the attention mechanism used in deep learning corresponds to the mechanism in which the cerebral basal ganglia suppress thalamic relay cells in the brain. Next, we demonstrate that, in neuroscience, the output of the basal ganglia is associated with the action output in the actor of reinforcement learning. Based on these, we show that the aforementioned loop can be generalized as reinforcement learning that controls the transmission of the prediction signal so as to maximize the prediction reward. We call this attentional reinforcement learning (ARL). In ARL, the actor selects the information transmission route according to the attention, and the prediction signal changes according to the context detected by the information source of the route. Hence, ARL enables flexible action selection that depends on the situation, unlike traditional reinforcement learning, wherein the actor must directly select an action.

Download Full-text

deepBioWSD: effective deep neural word sense disambiguation of biomedical text data

Journal of the American Medical Informatics Association ◽

10.1093/jamia/ocy189 ◽

2019 ◽

Vol 26 (5) ◽

pp. 438-446 ◽

Cited By ~ 3

Author(s):

Ahmad Pesaranghader ◽

Stan Matwin ◽

Marina Sokolova ◽

Ali Pesaranghader

Keyword(s):

Language Processing ◽

Short Term Memory ◽

Word Sense Disambiguation ◽

Training Data ◽

Biomedical Text ◽

Word Sense ◽

Vocabulary Size ◽

Unified Medical Language System ◽

Knowledge Based ◽

Sense Disambiguation

Abstract Objective In biomedicine, there is a wealth of information hidden in unstructured narratives such as research articles and clinical reports. To exploit these data properly, a word sense disambiguation (WSD) algorithm prevents downstream difficulties in the natural language processing applications pipeline. Supervised WSD algorithms largely outperform un- or semisupervised and knowledge-based methods; however, they train 1 separate classifier for each ambiguous term, necessitating a large number of expert-labeled training data, an unattainable goal in medical informatics. To alleviate this need, a single model that shares statistical strength across all instances and scales well with the vocabulary size is desirable. Materials and Methods Built on recent advances in deep learning, our deepBioWSD model leverages 1 single bidirectional long short-term memory network that makes sense prediction for any ambiguous term. In the model, first, the Unified Medical Language System sense embeddings will be computed using their text definitions; and then, after initializing the network with these embeddings, it will be trained on all (available) training data collectively. This method also considers a novel technique for automatic collection of training data from PubMed to (pre)train the network in an unsupervised manner. Results We use the MSH WSD dataset to compare WSD algorithms, with macro and micro accuracies employed as evaluation metrics. deepBioWSD outperforms existing models in biomedical text WSD by achieving the state-of-the-art performance of 96.82% for macro accuracy. Conclusions Apart from the disambiguation improvement and unsupervised training, deepBioWSD depends on considerably less number of expert-labeled data as it learns the target and the context terms jointly. These merit deepBioWSD to be conveniently deployable in real-time biomedical applications.

Download Full-text

CWPC_BiAtt: Character–Word–Position Combined BiLSTM-Attention for Chinese Named Entity Recognition

Information ◽

10.3390/info11010045 ◽

2020 ◽

Vol 11 (1) ◽

pp. 45 ◽

Cited By ~ 1

Author(s):

Shardrom Johnson ◽

Sherlock Shen ◽

Yuanchen Liu

Keyword(s):

Language Processing ◽

Short Term Memory ◽

Conditional Random Field ◽

Named Entity Recognition ◽

Attention Mechanism ◽

Entity Recognition ◽

Position Information ◽

Named Entity ◽

Pos Tagging ◽

Word Position

Usually taken as linguistic features by Part-Of-Speech (POS) tagging, Named Entity Recognition (NER) is a major task in Natural Language Processing (NLP). In this paper, we put forward a new comprehensive-embedding, considering three aspects, namely character-embedding, word-embedding, and pos-embedding stitched in the order we give, and thus get their dependencies, based on which we propose a new Character–Word–Position Combined BiLSTM-Attention (CWPC_BiAtt) for the Chinese NER task. Comprehensive-embedding via the Bidirectional Llong Short-Term Memory (BiLSTM) layer can get the connection between the historical and future information, and then employ the attention mechanism to capture the connection between the content of the sentence at the current position and that at any location. Finally, we utilize Conditional Random Field (CRF) to decode the entire tagging sequence. Experiments show that CWPC_BiAtt model we proposed is well qualified for the NER task on Microsoft Research Asia (MSRA) dataset and Weibo NER corpus. A high precision and recall were obtained, which verified the stability of the model. Position-embedding in comprehensive-embedding can compensate for attention-mechanism to provide position information for the disordered sequence, which shows that comprehensive-embedding has completeness. Looking at the entire model, our proposed CWPC_BiAtt has three distinct characteristics: completeness, simplicity, and stability. Our proposed CWPC_BiAtt model achieved the highest F-score, achieving the state-of-the-art performance in the MSRA dataset and Weibo NER corpus.

Download Full-text

A cyclic self-learning Chinese word segmentation for the geoscience domain

GEOMATICA ◽

10.1139/geomatica-2018-0007 ◽

2020 ◽

Author(s):

Qinjun Qiu ◽

Zhong Xie ◽

Liang Wu

Keyword(s):

Language Processing ◽

Domain Knowledge ◽

Short Term Memory ◽

Word Segmentation ◽

Training Data ◽

Local Context ◽

Chinese Word ◽

Chinese Word Segmentation ◽

Benchmark Datasets ◽

Self Learning

Unlike English and other western languages, Chinese does not delimit words using white-spaces. Chinese Word Segmentation (CWS) is the crucial first step towards natural language processing. However, for the geoscience subject domain, the CWS problem remains unresolved with many challenges. Although traditional methods can be used to process geoscience documents, they lack the domain knowledge for massive geoscience documents. Considering the above challenges, this motivated us to build a segmenter specifically for the geoscience domain. Currently, most of the state-of-the-art methods for Chinese word segmentation are based on supervised learning, whose features are mostly extracted from a local context. In this paper, we proposed a framework for sequence learning by incorporating cyclic self-learning corpus training. Following this framework, we build the GeoSegmenter based on the Bi-directional Long Short-Term Memory (Bi-LSTM) network model to perform Chinese word segmentation. It can gain a great advantage through iterations of the training data. Empirical experimental results on geoscience documents and benchmark datasets showed that geological documents can be identified, and it can also recognize the generic documents.

Download Full-text

Lexicon-Enhanced Attention Network Based on Text Representation for Sentiment Classification

Applied Sciences ◽

10.3390/app9183717 ◽

2019 ◽

Vol 9 (18) ◽

pp. 3717 ◽

Cited By ~ 3

Author(s):

Wenkuan Li ◽

Dongyuan Li ◽

Hongxia Yin ◽

Lindong Zhang ◽

Zhenfang Zhu ◽

...

Keyword(s):

Deep Learning ◽

Large Scale ◽

Attention Mechanism ◽

Sentiment Classification ◽

Classification Model ◽

Linguistic Knowledge ◽

Great Success ◽

Text Representation ◽

Attention Network ◽

Sentiment Lexicon

Text representation learning is an important but challenging issue for various natural language processing tasks. Recently, deep learning-based representation models have achieved great success for sentiment classification. However, these existing models focus on more semantic information rather than sentiment linguistic knowledge, which provides rich sentiment information and plays a key role in sentiment analysis. In this paper, we propose a lexicon-enhanced attention network (LAN) based on text representation to improve the performance of sentiment classification. Specifically, we first propose a lexicon-enhanced attention mechanism by combining the sentiment lexicon with an attention mechanism to incorporate sentiment linguistic knowledge into deep learning methods. Second, we introduce a multi-head attention mechanism in the deep neural network to interactively capture the contextual information from different representation subspaces at different positions. Furthermore, we stack a LAN model to build a hierarchical sentiment classification model for large-scale text. Extensive experiments are conducted to evaluate the effectiveness of the proposed models on four popular real-world sentiment classification datasets at both the sentence level and the document level. The experimental results demonstrate that our proposed models can achieve comparable or better performance than the state-of-the-art methods.

Download Full-text

Long Short-Term Memory with Dynamic Skip Connections

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33016481 ◽

2019 ◽

Vol 33 ◽

pp. 6481-6488 ◽

Cited By ~ 3

Author(s):

Tao Gui ◽

Qi Zhang ◽

Lujun Zhao ◽

Yaosong Lin ◽

Minlong Peng ◽

...

Keyword(s):

Language Processing ◽

Short Term Memory ◽

Training Data ◽

Sequential Data ◽

Short Term ◽

Term Memory ◽

Transition Functions ◽

Proposed Model ◽

Long Short Term Memory

In recent years, long short-term memory (LSTM) has been successfully used to model sequential data of variable length. However, LSTM can still experience difficulty in capturing long-term dependencies. In this work, we tried to alleviate this problem by introducing a dynamic skip connection, which can learn to directly connect two dependent words. Since there is no dependency information in the training data, we propose a novel reinforcement learning-based method to model the dependency relationship and connect dependent words. The proposed model computes the recurrent transition functions based on the skip connections, which provides a dynamic skipping advantage over RNNs that always tackle entire sentences sequentially. Our experimental results on three natural language processing tasks demonstrate that the proposed method can achieve better performance than existing methods. In the number prediction experiment, the proposed model outperformed LSTM with respect to accuracy by nearly 20%.

Download Full-text