Examining Attention Mechanisms in Deep Learning Models for Sentiment Analysis

Spyridon Kardakis; Isidoros Perikos; Foteini Grivokostopoulou; Ioannis Hatzilygeroudis

doi:10.3390/app11093883

Examining Attention Mechanisms in Deep Learning Models for Sentiment Analysis

Applied Sciences ◽

10.3390/app11093883 ◽

2021 ◽

Vol 11 (9) ◽

pp. 3883

Author(s):

Spyridon Kardakis ◽

Isidoros Perikos ◽

Foteini Grivokostopoulou ◽

Ioannis Hatzilygeroudis

Keyword(s):

Neural Networks ◽

Experimental Study ◽

Deep Learning ◽

Comparative Analysis ◽

Sentiment Analysis ◽

Recurrent Neural Networks ◽

Deep Neural Networks ◽

Training Methods ◽

Learning Models ◽

Neural Models

Attention-based methods for deep neural networks constitute a technique that has attracted increased interest in recent years. Attention mechanisms can focus on important parts of a sequence and, as a result, enhance the performance of neural networks in a variety of tasks, including sentiment analysis, emotion recognition, machine translation and speech recognition. In this work, we study attention-based models built on recurrent neural networks (RNNs) and examine their performance in various contexts of sentiment analysis. Self-attention, global-attention and hierarchical-attention methods are examined under various deep neural models, training methods and hyperparameters. Even though attention mechanisms are a powerful recent concept in the field of deep learning, their exact effectiveness in sentiment analysis is yet to be thoroughly assessed. A comparative analysis is performed in a text sentiment classification task where baseline models are compared with and without the use of attention for every experiment. The experimental study additionally examines the proposed models’ ability in recognizing opinions and emotions in movie reviews. The results indicate that attention-based models lead to great improvements in the performance of deep neural models showcasing up to a 3.5% improvement in their accuracy.

Download Full-text

Deep Sparse Learning for Automatic Modulation Classification Using Recurrent Neural Networks

Sensors ◽

10.3390/s21196410 ◽

2021 ◽

Vol 21 (19) ◽

pp. 6410

Author(s):

Ke Zang ◽

Wenqi Wu ◽

Wei Luo

Keyword(s):

Neural Networks ◽

Deep Learning ◽

Recurrent Neural Networks ◽

Deep Neural Networks ◽

Learning Algorithm ◽

Model Performance ◽

Sparse Learning ◽

Modulation Classification ◽

Learning Models ◽

Automatic Modulation Classification

Deep learning models, especially recurrent neural networks (RNNs), have been successfully applied to automatic modulation classification (AMC) problems recently. However, deep neural networks are usually overparameterized, i.e., most of the connections between neurons are redundant. The large model size hinders the deployment of deep neural networks in applications such as Internet-of-Things (IoT) networks. Therefore, reducing parameters without compromising the network performance via sparse learning is often desirable since it can alleviates the computational and storage burdens of deep learning models. In this paper, we propose a sparse learning algorithm that can directly train a sparsely connected neural network based on the statistics of weight magnitude and gradient momentum. We first used the MNIST and CIFAR10 datasets to demonstrate the effectiveness of this method. Subsequently, we applied it to RNNs with different pruning strategies on recurrent and non-recurrent connections for AMC problems. Experimental results demonstrated that the proposed method can effectively reduce the parameters of the neural networks while maintaining model performance. Moreover, we show that appropriate sparsity can further improve network generalization ability.

Download Full-text

Levenshtein Augmentation Improves Performance of SMILES Based Deep-Learning Synthesis Prediction

10.26434/chemrxiv.12562121 ◽

2020 ◽

Author(s):

Dean Sumner ◽

Jiazhen He ◽

Amol Thakkar ◽

Ola Engkvist ◽

Esben Jannik Bjerrum

Keyword(s):

Neural Networks ◽

Pattern Recognition ◽

Deep Learning ◽

Recurrent Neural Networks ◽

Data Augmentation ◽

State Of The Art ◽

Sequence Similarity ◽

Learning Models ◽

Underlying Network

<p>SMILES randomization, a form of data augmentation, has previously been shown to increase the performance of deep learning models compared to non-augmented baselines. Here, we propose a novel data augmentation method we call “Levenshtein augmentation” which considers local SMILES sub-sequence similarity between reactants and their respective products when creating training pairs. The performance of Levenshtein augmentation was tested using two state of the art models - transformer and sequence-to-sequence based recurrent neural networks with attention. Levenshtein augmentation demonstrated an increase performance over non-augmented, and conventionally SMILES randomization augmented data when used for training of baseline models. Furthermore, Levenshtein augmentation seemingly results in what we define as <i>attentional gain </i>– an enhancement in the pattern recognition capabilities of the underlying network to molecular motifs.</p>

Download Full-text

A Review of Recent Deep Learning Models in COVID-19 Diagnosis

European Journal of Engineering and Technology Research ◽

10.24018/ejers.2021.6.5.2485 ◽

2021 ◽

Vol 6 (5) ◽

pp. 10-15

Author(s):

Ela Bhattacharya ◽

D. Bhattacharya

Keyword(s):

Artificial Intelligence ◽

Neural Networks ◽

Deep Learning ◽

Deep Neural Networks ◽

Test Results ◽

Learning Models ◽

Future Directions ◽

Human Contact ◽

The World ◽

Short Span

COVID-19 has emerged as the latest worrisome pandemic, which is reported to have its outbreak in Wuhan, China. The infection spreads by means of human contact, as a result, it has caused massive infections across 200 countries around the world. Artificial intelligence has likewise contributed to managing the COVID-19 pandemic in various aspects within a short span of time. Deep Neural Networks that are explored in this paper have contributed to the detection of COVID-19 from imaging sources. The datasets, pre-processing, segmentation, feature extraction, classification and test results which can be useful for discovering future directions in the domain of automatic diagnosis of the disease, utilizing artificial intelligence-based frameworks, have been investigated in this paper.

Download Full-text

SNR: S queezing N umerical R ange Defuses Bit Error Vulnerability Surface in Deep Neural Networks

ACM Transactions on Embedded Computing Systems ◽

10.1145/3477007 ◽

2021 ◽

Vol 20 (5s) ◽

pp. 1-25

Author(s):

Elbruz Ozen ◽

Alex Orailoglu

Keyword(s):

Neural Networks ◽

Deep Learning ◽

Deep Neural Networks ◽

Low Cost ◽

Error Resilience ◽

Error Rates ◽

Training Methods ◽

Performance Requirements ◽

Proper Design ◽

And Performance

As deep learning algorithms are widely adopted, an increasing number of them are positioned in embedded application domains with strict reliability constraints. The expenditure of significant resources to satisfy performance requirements in deep neural network accelerators has thinned out the margins for delivering safety in embedded deep learning applications, thus precluding the adoption of conventional fault tolerance methods. The potential of exploiting the inherent resilience characteristics of deep neural networks remains though unexplored, offering a promising low-cost path towards safety in embedded deep learning applications. This work demonstrates the possibility of such exploitation by juxtaposing the reduction of the vulnerability surface through the proper design of the quantization schemes with shaping the parameter distributions at each layer through the guidance offered by appropriate training methods, thus delivering deep neural networks of high resilience merely through algorithmic modifications. Unequaled error resilience characteristics can be thus injected into safety-critical deep learning applications to tolerate bit error rates of up to at absolutely zero hardware, energy, and performance costs while improving the error-free model accuracy even further.

Download Full-text

Image Compression Based on Deep Learning: A Review

Asian Journal of Research in Computer Science ◽

10.9734/ajrcos/2021/v8i130193 ◽

2021 ◽

pp. 62-76

Author(s):

Hajar Maseeh Yasin ◽

Adnan Mohsin Abdulazeez

Keyword(s):

Neural Networks ◽

Deep Learning ◽

Image Compression ◽

Recurrent Neural Networks ◽

Deep Neural Networks ◽

Review Paper ◽

High Accuracy ◽

Machine Learning Methods ◽

Digital Era ◽

Different Types

Image compression is an essential technology for encoding and improving various forms of images in the digital era. The inventors have extended the principle of deep learning to the different states of neural networks as one of the most exciting machine learning methods to show that it is the most versatile way to analyze, classify, and compress images. Many neural networks are required for image compressions, such as deep neural networks, artificial neural networks, recurrent neural networks, and convolution neural networks. Therefore, this review paper discussed how to apply the rule of deep learning to various neural networks to obtain better compression in the image with high accuracy and minimize loss and superior visibility of the image. Therefore, deep learning and its application to different types of images in a justified manner with distinct analysis to obtain these things need deep learning.

Download Full-text

Colorizing and Captioning Images Using Deep Learning Models and Deploying Them Via IoT Deployment Tools

International Journal of Information Retrieval Research ◽

10.4018/ijirr.2020100103 ◽

2020 ◽

Vol 10 (4) ◽

pp. 35-50

Author(s):

Rajalakshmi Krishnamurthi ◽

Raghav Maheshwari ◽

Rishabh Gulati

Keyword(s):

Neural Networks ◽

Deep Learning ◽

Comparative Analysis ◽

Computer Science ◽

Loss Rate ◽

Learning Models

Neural networks and IoT are some top fields of research in computer science nowadays. Inspired by this, this article works on using and creating an efficient neural networks model for colorizing images and transports them to remote systems through IoT deployment tools. This article develops two models, Alpha and Beta, for the colorization of the greyscale images. Efficient models are developed to lessen the loss rate to around 0.005. Further, it also develops an efficient model for the captioning of an image. The paper then describes the use of tools like AWS Greengrass and Docker for the deployment of different neural networks models, providing a comparative analysis among them, combining neural networks with IoT deployment tools.

Download Full-text

PM2.5 Prediction Model Based on Combinational Hammerstein Recurrent Neural Networks

Mathematics ◽

10.3390/math8122178 ◽

2020 ◽

Vol 8 (12) ◽

pp. 2178

Author(s):

Yi-Chung Chen ◽

Tsu-Chiang Lei ◽

Shun Yao ◽

Hsin-Ping Wang

Keyword(s):

Neural Networks ◽

Deep Learning ◽

Recurrent Neural Networks ◽

Predictive Accuracy ◽

Learning Models ◽

Short Term ◽

Computational Overhead ◽

Implementation Costs ◽

External Sources

Airborne particulate matter 2.5 (PM2.5) can have a profound effect on the health of the population. Many researchers have been reporting highly accurate numerical predictions based on raw PM2.5 data imported directly into deep learning models; however, there is still considerable room for improvement in terms of implementation costs due to heavy computational overhead. From the perspective of environmental science, PM2.5 values in a given location can be attributed to local sources as well as external sources. Local sources tend to have a dramatic short-term impact on PM2.5 values, whereas external sources tend to have more subtle but longer-lasting effects. In the presence of PM2.5 from both sources at the same time, this combination of effects can undermine the predictive accuracy of the model. This paper presents a novel combinational Hammerstein recurrent neural network (CHRNN) to enhance predictive accuracy and overcome the heavy computational and monetary burden imposed by deep learning models. The CHRNN comprises a based-neural network tasked with learning gradual (long-term) fluctuations in conjunction with add-on neural networks to deal with dramatic (short-term) fluctuations. The CHRNN can be coupled with a random forest model to determine the degree to which short-term effects influence long-term outcomes. We also developed novel feature selection and normalization methods to enhance prediction accuracy. Using real-world measurement data of air quality and PM2.5 datasets from Taiwan, the precision of the proposed system in the numerical prediction of PM2.5 levels was comparable to that of state-of-the-art deep learning models, such as deep recurrent neural networks and long short-term memory, despite far lower implementation costs and computational overhead.

Download Full-text

Identification of Thoracic Diseases by Exploiting Deep Neural Networks (Preprint)

10.2196/preprints.23644 ◽

2020 ◽

Author(s):

Albahli Saleh ◽

Ali Alkhalifah

Keyword(s):

Neural Network ◽

Neural Networks ◽

Deep Learning ◽

Convolutional Neural Network ◽

Deep Neural Networks ◽

Medical Image Analysis ◽

Medical Community ◽

Learning Models ◽

X Ray ◽

Chest Disease

BACKGROUND To diagnose cardiothoracic diseases, a chest x-ray (CXR) is examined by a radiologist. As more people get affected, doctors are becoming scarce especially in developing countries. However, with the advent of image processing tools, the task of diagnosing these cardiothoracic diseases has seen great progress. A lot of researchers have put in work to see how the problems associated with medical images can be mitigated by using neural networks. OBJECTIVE Previous works used state-of-the-art techniques and got effective results with one or two cardiothoracic diseases but could lead to misclassification. In our work, we adopted GANs to synthesize the chest radiograph (CXR) to augment the training set on multiple cardiothoracic diseases to efficiently diagnose the chest diseases in different classes as shown in Figure 1. In this regard, our major contributions are classifying various cardiothoracic diseases to detect a specific chest disease based on CXR, use the advantage of GANs to overcome the shortages of small training datasets, address the problem of imbalanced data; and implementing optimal deep neural network architecture with different hyper-parameters to improve the model with the best accuracy. METHODS For this research, we are not building a model from scratch due to computational restraints as they require very high-end computers. Rather, we use a Convolutional Neural Network (CNN) as a class of deep neural networks to propose a generative adversarial network (GAN) -based model to generate synthetic data for training the data as the amount of the data is limited. We will use pre-trained models which are models that were trained on a large benchmark dataset to solve a problem similar to the one we want to solve. For example, the ResNet-152 model we used was initially trained on the ImageNet dataset. RESULTS After successful training and validation of the models we developed, ResNet-152 with image augmentation proved to be the best model for the automatic detection of cardiothoracic disease. However, one of the main problems associated with radiographic deep learning projects and research is the scarcity and unavailability of enough datasets which is a key component of all deep learning models as they require a lot of data for training. This is the reason why some of our models had image augmentation to increase the number of images without duplication. As more data are collected in the field of chest radiology, the models could be retrained to improve the accuracies of the models as deep learning models improve with more data. CONCLUSIONS This research employs the advantages of computer vision and medical image analysis to develop an automated model that has the clinical potential for early detection of the disease. Using deep learning models, the research aims to evaluate the effectiveness and accuracy of different convolutional neural network models in the automatic diagnosis of cardiothoracic diseases from x-ray images compared to diagnosis by experts in the medical community.

Download Full-text

Evaluation of Deep Learning Methods Efficiency for Malicious and Benign System Calls Classification on the AWSCTD

Security and Communication Networks ◽

10.1155/2019/2317976 ◽

2019 ◽

Vol 2019 ◽

pp. 1-12 ◽

Cited By ~ 1

Author(s):

Dainius Čeponis ◽

Nikolaj Goranin

Keyword(s):

Neural Networks ◽

Deep Learning ◽

Comparative Analysis ◽

Recurrent Neural Networks ◽

Data Driven ◽

Learning Methods ◽

System Calls ◽

Number Of Layers ◽

The Moment ◽

Family Classification

The increasing amount of malware and cyberattacks on a host level increases the need for a reliable anomaly-based host IDS (HIDS) that would be able to deal with zero-day attacks and would ensure low false alarm rate (FAR), which is critical for the detection of such activity. Deep learning methods such as convolutional neural networks (CNNs) and recurrent neural networks (RNNs) are considered to be highly suitable for solving data-driven security solutions. Therefore, it is necessary to perform the comparative analysis of such methods in order to evaluate their efficiency in attack classification as well as their ability to distinguish malicious and benign activity. In this article, we present the results achieved with the AWSCTD (attack-caused Windows OS system calls traces dataset), which can be considered as the most exhaustive set of host-level anomalies at the moment, including 112.56 million system calls from 12110 executable malware samples and 3145 benign software samples with 16.3 million system calls. The best results were obtained with CNNs with up to 90.0% accuracy for family classification and 95.0% accuracy for malicious/benign determination. RNNs demonstrated slightly inferior results. Furthermore, CNN tuning via an increase in the number of layers should make them practically applicable for host-level anomaly detection.

Download Full-text

Network attributes describe a similarity between deep neural networks and large scale brain networks

Journal of Complex Networks ◽

10.1093/comnet/cnz044 ◽

2019 ◽

Author(s):

Kosuke Takagi

Keyword(s):

Neural Networks ◽

Deep Learning ◽

Human Brain ◽

Large Scale ◽

Deep Neural Networks ◽

Distribution Model ◽

Common Mechanism ◽

Learning Models ◽

Connection Weight ◽

Wide Range

Abstract Despite the recent success of deep learning models in solving various problems, their ability is still limited compared with human intelligence, which has the flexibility to adapt to a changing environment. To obtain a model which achieves adaptability to a wide range of problems and tasks is a challenging problem. To achieve this, an issue that must be addressed is identification of the similarities and differences between the human brain and deep neural networks. In this article, inspired by the human flexibility which might suggest the existence of a common mechanism allowing solution of different kinds of tasks, we consider a general learning process in neural networks, on which no specific conditions and constraints are imposed. Subsequently, we theoretically show that, according to the learning progress, the network structure converges to the state, which is characterized by a unique distribution model with respect to network quantities such as the connection weight and node strength. Noting that the empirical data indicate that this state emerges in the large scale network in the human brain, we show that the same state can be reproduced in a simple example of deep learning models. Although further research is needed, our findings provide an insight into the common inherent mechanism underlying the human brain and deep learning. Thus, our findings provide suggestions for designing efficient learning algorithms for solving a wide variety of tasks in the future.

Download Full-text