Improving Sentence Representations via Component Focusing

Xiaoya Yin; Wu Zhang; Wenhao Zhu; Shuang Liu; Tengjun Yao

doi:10.3390/app10030958

Improving Sentence Representations via Component Focusing

Applied Sciences ◽

10.3390/app10030958 ◽

2020 ◽

Vol 10 (3) ◽

pp. 958 ◽

Cited By ~ 1

Author(s):

Xiaoya Yin ◽

Wu Zhang ◽

Wenhao Zhu ◽

Shuang Liu ◽

Tengjun Yao

Keyword(s):

Neural Network ◽

Language Processing ◽

Basic Part ◽

Primary Meaning ◽

Significant Performance ◽

The Subject ◽

Standard Models ◽

General Linguistic ◽

Syntactic Relations ◽

Linguistic Fact

The efficiency of natural language processing (NLP) tasks, such as text classification and information retrieval, can be significantly improved with proper sentence representations. Neural networks such as convolutional neural network (CNN) and recurrent neural network (RNN) are gradually applied to learn the representations of sentences and are suitable for processing sequences. Recently, bidirectional encoder representations from transformers (BERT) has attracted much attention because it achieves state-of-the-art performance on various NLP tasks. However, these standard models do not adequately address a general linguistic fact, that is, different sentence components serve diverse roles in the meaning of a sentence. In general, the subject, predicate, and object serve the most crucial roles as they represent the primary meaning of a sentence. Additionally, words in a sentence are also related to each other by syntactic relations. To emphasize on these issues, we propose a sentence representation model, a modification of the pre-trained bidirectional encoder representations from transformers (BERT) network via component focusing (CF-BERT). The sentence representation consists of a basic part which refers to the complete sentence, and a component-enhanced part, which focuses on subject, predicate, object, and their relations. For the best performance, a weight factor is introduced to adjust the ratio of both parts. We evaluate CF-BERT on two different tasks: semantic textual similarity and entailment classification. Results show that CF-BERT yields a significant performance gain compared to other sentence representation methods.

Download Full-text

Embedding from Language Models (ELMos)- based Dependency Parser for Indonesian Language

International Journal of Advances in Soft Computing and its Applications ◽

10.15849/ijasca.211128.01 ◽

2021 ◽

Vol 13 (3) ◽

pp. 2-11

Keyword(s):

Neural Network ◽

Natural Language Processing ◽

Language Processing ◽

Functional Relationship ◽

Nearest Neighbor ◽

Object Relation ◽

Language Models ◽

Word Representation ◽

The Subject ◽

Dependency Parser

The goal of dependency parsing is to seek a functional relationship among words. For instance, it tells the subject-object relation in a sentence. Parsing the Indonesian language requires information about the morphology of a word. Indonesian grammar relies heavily on affixation to combine root words with affixes to form another word. Thus, morphology information should be incorporated. Fortunately, it can be encoded implicitly by word representation. Embeddings from Language Models (ELMo) is a word representation which be able to capture morphology information. Unlike most widely used word representations such as word2vec or Global Vectors (GloVe), ELMo utilizes a Convolutional Neural Network (CNN) over characters. With it, the affixation process could ideally encoded in a word representation. We did an analysis using nearest neighbor words and T-distributed Stochastic Neighbor Embedding (t-SNE) word visualization to compare word2vec and ELMo. Our result showed that ELMo representation is richer in encoding the morphology information than it's counterpart. We trained our parser using word2vec and ELMo. To no surprise, the parser which uses ELMo gets a higher accuracy than word2vec. We obtain Unlabeled Attachment Score (UAS) at 83.08 for ELMo and 81.35 for word2vec. Hence, we confirmed that morphology information is necessary, especially in a morphologically rich language like Indonesian. Keywords: ELMo, Dependency Parser, Natural Language Processing, word2vec

Download Full-text

Fast Neural Network Engine for Natural Science Language Processing: A Drug-Search Case.

10.26434/chemrxiv.12800348 ◽

2020 ◽

Author(s):

Vadim V. Korolev ◽

Artem Mitrofanov ◽

Kirill Karpov ◽

Valery Tkachenko

Keyword(s):

Neural Network ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Natural Science ◽

Therapeutic Agent ◽

Semantic Relations ◽

Chemical Data ◽

Processing Methods ◽

Modern Natural

The main advantage of modern natural language processing methods is a possibility to turn an amorphous human-readable task into a strict mathematic form. That allows to extract chemical data and insights from articles and to find new semantic relations. We propose a universal engine for processing chemical and biological texts. We successfully tested it on various use-cases and applied to a case of searching a therapeutic agent for a COVID-19 disease by analyzing PubMed archive.

Download Full-text

Identifying protein subcellular localisation in scientific literature using bidirectional deep recurrent neural network

Scientific Reports ◽

10.1038/s41598-020-80441-8 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Rakesh David ◽

Rhys-Joshua D. Menezes ◽

Jan De Klerk ◽

Ian R. Castleden ◽

Cornelia M. Hooper ◽

...

Keyword(s):

Neural Network ◽

Language Processing ◽

Data Dissemination ◽

Short Term Memory ◽

Biological Data ◽

Experimental Methodology ◽

Subcellular Localisation ◽

Crop Species ◽

Deep Recurrent Neural Network ◽

Functional Features

AbstractThe increased diversity and scale of published biological data has to led to a growing appreciation for the applications of machine learning and statistical methodologies to gain new insights. Key to achieving this aim is solving the Relationship Extraction problem which specifies the semantic interaction between two or more biological entities in a published study. Here, we employed two deep neural network natural language processing (NLP) methods, namely: the continuous bag of words (CBOW), and the bi-directional long short-term memory (bi-LSTM). These methods were employed to predict relations between entities that describe protein subcellular localisation in plants. We applied our system to 1700 published Arabidopsis protein subcellular studies from the SUBA manually curated dataset. The system combines pre-processing of full-text articles in a machine-readable format with relevant sentence extraction for downstream NLP analysis. Using the SUBA corpus, the neural network classifier predicted interactions between protein name, subcellular localisation and experimental methodology with an average precision, recall rate, accuracy and F1 scores of 95.1%, 82.8%, 89.3% and 88.4% respectively (n = 30). Comparable scoring metrics were obtained using the CropPAL database as an independent testing dataset that stores protein subcellular localisation in crop species, demonstrating wide applicability of prediction model. We provide a framework for extracting protein functional features from unstructured text in the literature with high accuracy, improving data dissemination and unlocking the potential of big data text analytics for generating new hypotheses.

Download Full-text

Multi-Transformer: A New Neural Network-Based Architecture for Forecasting S&P Volatility

Mathematics ◽

10.3390/math9151794 ◽

2021 ◽

Vol 9 (15) ◽

pp. 1794

Author(s):

Eduardo Ramos-Pérez ◽

Pablo J. Alonso-González ◽

José Javier Núñez-Velázquez

Keyword(s):

Neural Network ◽

Language Processing ◽

Short Term Memory ◽

Risk Measures ◽

Hybrid Models ◽

Stock Volatility ◽

Management Actions ◽

Equity Risk ◽

Hedging Strategies ◽

Volatility Models

Events such as the Financial Crisis of 2007–2008 or the COVID-19 pandemic caused significant losses to banks and insurance entities. They also demonstrated the importance of using accurate equity risk models and having a risk management function able to implement effective hedging strategies. Stock volatility forecasts play a key role in the estimation of equity risk and, thus, in the management actions carried out by financial institutions. Therefore, this paper has the aim of proposing more accurate stock volatility models based on novel machine and deep learning techniques. This paper introduces a neural network-based architecture, called Multi-Transformer. Multi-Transformer is a variant of Transformer models, which have already been successfully applied in the field of natural language processing. Indeed, this paper also adapts traditional Transformer layers in order to be used in volatility forecasting models. The empirical results obtained in this paper suggest that the hybrid models based on Multi-Transformer and Transformer layers are more accurate and, hence, they lead to more appropriate risk measures than other autoregressive algorithms or hybrid models based on feed forward layers or long short term memory cells.

Download Full-text

Research on Inversion Mechanism of Chlorophyll—A Concentration in Water Bodies Using a Convolutional Neural Network Model

Water ◽

10.3390/w13050664 ◽

2021 ◽

Vol 13 (5) ◽

pp. 664

Author(s):

Yun Xue ◽

Lei Zhu ◽

Bin Zou ◽

Yi-min Wen ◽

Yue-hong Long ◽

...

Keyword(s):

Neural Network ◽

Regression Model ◽

Convolutional Neural Network ◽

Chlorophyll A ◽

Language Processing ◽

Water Bodies ◽

Inversion Effect ◽

Least Squares Regression ◽

Chlorophyll A Concentration ◽

Chl A

For Case-II water bodies with relatively complex water qualities, it is challenging to establish a chlorophyll-a concentration (Chl-a concentration) inversion model with strong applicability and high accuracy. Convolutional Neural Network (CNN) shows excellent performance in image target recognition and natural language processing. However, there little research exists on the inversion of Chl-a concentration in water using convolutional neural networks. Taking China’s Dongting Lake as an example, 90 water samples and their spectra were collected in this study. Using eight combinations as independent variables and Chl-a concentration as the dependent variable, a CNN model was constructed to invert Chl-a concentration. The results showed that: (1) The CNN model of the original spectrum has a worse inversion effect than the CNN model of the preprocessed spectrum. The determination coefficient (RP2) of the predicted sample is increased from 0.79 to 0.88, and the root mean square error (RMSEP) of the predicted sample is reduced from 0.61 to 0.49, indicating that preprocessing can significantly improve the inversion effect of the model.; (2) among the combined models, the CNN model with Baseline1_SC (strong correlation factor of 500–750 nm baseline) has the best effect, with RP2 reaching 0.90 and RMSEP only 0.45. The average inversion effect of the eight CNN models is better. The average RP2 reaches 0.86 and the RMSEP is only 0.52, indicating the feasibility of applying CNN to Chl-a concentration inversion modeling; (3) the performance of the CNN model (Baseline1_SC (RP2 = 0.90, RMSEP = 0.45)) was far better than the traditional model of the same combination, i.e., the linear regression model (RP2 = 0.61, RMSEP = 0.72) and partial least squares regression model (Baseline1_SC (RP2 = 0.58. RMSEP = 0.95)), indicating the superiority of the convolutional neural network inversion modeling of water body Chl-a concentration.

Download Full-text

Learning Subject-Generalized Topographical EEG Embeddings Using Deep Variational Autoencoders and Domain-Adversarial Regularization

Sensors ◽

10.3390/s21051792 ◽

2021 ◽

Vol 21 (5) ◽

pp. 1792

Author(s):

Juan Hagad ◽

Tsukasa Kimura ◽

Ken-ichi Fukui ◽

Masayuki Numao

Keyword(s):

Neural Network ◽

Network Architecture ◽

Emotion Classification ◽

Limited Data ◽

Neural Network Architecture ◽

Building Models ◽

The Subject ◽

Data Constraints ◽

Input Level ◽

Normally Distributed

Two of the biggest challenges in building models for detecting emotions from electroencephalography (EEG) devices are the relatively small amount of labeled samples and the strong variability of signal feature distributions between different subjects. In this study, we propose a context-generalized model that tackles the data constraints and subject variability simultaneously using a deep neural network architecture optimized for normally distributed subject-independent feature embeddings. Variational autoencoders (VAEs) at the input level allow the lower feature layers of the model to be trained on both labeled and unlabeled samples, maximizing the use of the limited data resources. Meanwhile, variational regularization encourages the model to learn Gaussian-distributed feature embeddings, resulting in robustness to small dataset imbalances. Subject-adversarial regularization applied to the bi-lateral features further enforces subject-independence on the final feature embedding used for emotion classification. The results from subject-independent performance experiments on the SEED and DEAP EEG-emotion datasets show that our model generalizes better across subjects than other state-of-the-art feature embeddings when paired with deep learning classifiers. Furthermore, qualitative analysis of the embedding space reveals that our proposed subject-invariant bi-lateral variational domain adversarial neural network (BiVDANN) architecture may improve the subject-independent performance by discovering normally distributed features.

Download Full-text

Sentence similarity evaluation using Sent2Vec and siamese neural network with parallel structure

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-189593 ◽

2021 ◽

pp. 1-10

Author(s):

Hye-Jeong Song ◽

Tak-Sung Heo ◽

Jong-Dae Kim ◽

Chan-Young Park ◽

Yu-Seop Kim

Keyword(s):

Neural Network ◽

Language Processing ◽

Short Term Memory ◽

Parallel Structure ◽

Short Term ◽

Similarity Estimation ◽

Accurate Judgment ◽

Proposed Model ◽

Sentence Similarity ◽

Long Short Term Memory

Sentence similarity evaluation is a significant task used in machine translation, classification, and information extraction in the field of natural language processing. When two sentences are given, an accurate judgment should be made whether the meaning of the sentences is equivalent even if the words and contexts of the sentences are different. To this end, existing studies have measured the similarity of sentences by focusing on the analysis of words, morphemes, and letters. To measure sentence similarity, this study uses Sent2Vec, a sentence embedding, as well as morpheme word embedding. Vectors representing words are input to the 1-dimension convolutional neural network (1D-CNN) with various sizes of kernels and bidirectional long short-term memory (Bi-LSTM). Self-attention is applied to the features transformed through Bi-LSTM. Subsequently, vectors undergoing 1D-CNN and self-attention are converted through global max pooling and global average pooling to extract specific values, respectively. The vectors generated through the above process are concatenated to the vector generated through Sent2Vec and are represented as a single vector. The vector is input to softmax layer, and finally, the similarity between the two sentences is determined. The proposed model can improve the accuracy by up to 5.42% point compared with the conventional sentence similarity estimation models.

Download Full-text

Towards Accurate Deceptive Opinions Detection Based on Word Order-Preserving CNN

Mathematical Problems in Engineering ◽

10.1155/2018/2410206 ◽

2018 ◽

Vol 2018 ◽

pp. 1-9 ◽

Cited By ~ 4

Author(s):

Siyuan Zhao ◽

Zhiwei Xu ◽

Limin Liu ◽

Mengjie Guo ◽

Jing Yun

Keyword(s):

Neural Network ◽

Natural Language Processing ◽

Natural Language ◽

Convolutional Neural Network ◽

Language Processing ◽

Word Order ◽

Text Analysis ◽

Important Application ◽

Detection Mechanism ◽

Short Text

Convolutional neural network (CNN) has revolutionized the field of natural language processing, which is considerably efficient at semantics analysis that underlies difficult natural language processing problems in a variety of domains. The deceptive opinion detection is an important application of the existing CNN models. The detection mechanism based on CNN models has better self-adaptability and can effectively identify all kinds of deceptive opinions. Online opinions are quite short, varying in their types and content. In order to effectively identify deceptive opinions, we need to comprehensively study the characteristics of deceptive opinions and explore novel characteristics besides the textual semantics and emotional polarity that have been widely used in text analysis. In this paper, we optimize the convolutional neural network model by embedding the word order characteristics in its convolution layer and pooling layer, which makes convolutional neural network more suitable for short text classification and deceptive opinions detection. The TensorFlow-based experiments demonstrate that the proposed detection mechanism achieves more accurate deceptive opinion detection results.

Download Full-text

Efficient Embedded Decoding of Neural Network Language Models in a Machine Translation System

International Journal of Neural Systems ◽

10.1142/s0129065718500077 ◽

2018 ◽

Vol 28 (09) ◽

pp. 1850007

Author(s):

Francisco Zamora-Martinez ◽

Maria Jose Castro-Bleda

Keyword(s):

Neural Network ◽

Machine Translation ◽

Language Processing ◽

Traditional Approach ◽

Computational Cost ◽

Integrated Approach ◽

Language Models ◽

Translation System ◽

Neural Net ◽

Network Language

Neural Network Language Models (NNLMs) are a successful approach to Natural Language Processing tasks, such as Machine Translation. We introduce in this work a Statistical Machine Translation (SMT) system which fully integrates NNLMs in the decoding stage, breaking the traditional approach based on [Formula: see text]-best list rescoring. The neural net models (both language models (LMs) and translation models) are fully coupled in the decoding stage, allowing to more strongly influence the translation quality. Computational issues were solved by using a novel idea based on memorization and smoothing of the softmax constants to avoid their computation, which introduces a trade-off between LM quality and computational cost. These ideas were studied in a machine translation task with different combinations of neural networks used both as translation models and as target LMs, comparing phrase-based and [Formula: see text]-gram-based systems, showing that the integrated approach seems more promising for [Formula: see text]-gram-based systems, even with nonfull-quality NNLMs.

Download Full-text

Emergence of a compositional neural code for written words: Recycling of a convolutional neural network for reading

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.2104779118 ◽

2021 ◽

Vol 118 (46) ◽

pp. e2104779118

Author(s):

T. Hannagan ◽

A. Agrawal ◽

L. Cohen ◽

S. Dehaene

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Language Processing ◽

Visual Pathway ◽

Neural Code ◽

Letter Recognition ◽

Invariant Representation ◽

Spoken Language Processing ◽

Visual Word Form Area ◽

Ventral Visual Pathway

The visual word form area (VWFA) is a region of human inferotemporal cortex that emerges at a fixed location in the occipitotemporal cortex during reading acquisition and systematically responds to written words in literate individuals. According to the neuronal recycling hypothesis, this region arises through the repurposing, for letter recognition, of a subpart of the ventral visual pathway initially involved in face and object recognition. Furthermore, according to the biased connectivity hypothesis, its reproducible localization is due to preexisting connections from this subregion to areas involved in spoken-language processing. Here, we evaluate those hypotheses in an explicit computational model. We trained a deep convolutional neural network of the ventral visual pathway, first to categorize pictures and then to recognize written words invariantly for case, font, and size. We show that the model can account for many properties of the VWFA, particularly when a subset of units possesses a biased connectivity to word output units. The network develops a sparse, invariant representation of written words, based on a restricted set of reading-selective units. Their activation mimics several properties of the VWFA, and their lesioning causes a reading-specific deficit. The model predicts that, in literate brains, written words are encoded by a compositional neural code with neurons tuned either to individual letters and their ordinal position relative to word start or word ending or to pairs of letters (bigrams).

Download Full-text