scholarly journals Sentiment Analysis of Short Informal Texts

2014 ◽  
Vol 50 ◽  
pp. 723-762 ◽  
Author(s):  
S. Kiritchenko ◽  
X. Zhu ◽  
S. M. Mohammad

We describe a state-of-the-art sentiment analysis system that detects (a) the sentiment of short informal textual messages such as tweets and SMS (message-level task) and (b) the sentiment of a word or a phrase within a message (term-level task). The system is based on a supervised statistical text classification approach leveraging a variety of surface-form, semantic, and sentiment features. The sentiment features are primarily derived from novel high-coverage tweet-specific sentiment lexicons. These lexicons are automatically generated from tweets with sentiment-word hashtags and from tweets with emoticons. To adequately capture the sentiment of words in negated contexts, a separate sentiment lexicon is generated for negated words. The system ranked first in the SemEval-2013 shared task `Sentiment Analysis in Twitter' (Task 2), obtaining an F-score of 69.02 in the message-level task and 88.93 in the term-level task. Post-competition improvements boost the performance to an F-score of 70.45 (message-level task) and 89.50 (term-level task). The system also obtains state-of-the-art performance on two additional datasets: the SemEval-2013 SMS test set and a corpus of movie review excerpts. The ablation experiments demonstrate that the use of the automatically generated lexicons results in performance gains of up to 6.5 absolute percentage points.

2020 ◽  
Author(s):  
Pathikkumar Patel ◽  
Bhargav Lad ◽  
Jinan Fiaidhi

During the last few years, RNN models have been extensively used and they have proven to be better for sequence and text data. RNNs have achieved state-of-the-art performance levels in several applications such as text classification, sequence to sequence modelling and time series forecasting. In this article we will review different Machine Learning and Deep Learning based approaches for text data and look at the results obtained from these methods. This work also explores the use of transfer learning in NLP and how it affects the performance of models on a specific application of sentiment analysis.


2014 ◽  
Vol 23 (04) ◽  
pp. 1460019
Author(s):  
Hsiang Hui Lek ◽  
Danny Chiang Choon Poo

Sentiment lexicon plays an important role in determining the polarity of words and proves to be an important component in sentiment analysis applications. Most of these sentiment lexicons assign a fixed polarity to each word. However, it has been noted that the polarity of words depends on how they are used and so these lexicons are unable to accurately classify the polarity of the sentiments. By considering the aspect and domain of a word will allow us to more accurately classify sentiments. This paper presents a fully automatic method to build an aspect and domain sensitive sentiment lexicon which assigns a polarity to a word depending on both the aspect and the domain. The experimental results show that our lexicon significantly outperforms other commonly used sentiment lexicons / state-of-the-art approaches. To the best of our knowledge, such a lexicon is not publicly available. As such, we also make this lexicon publicly available as we believe it will benefit the research community. In addition, we observe the long tail distribution behavior of product aspects and propose the possibility of aspect ranking by comparing the number of domains and number of sentiment words present for an aspect.


10.2196/17832 ◽  
2020 ◽  
Vol 8 (7) ◽  
pp. e17832
Author(s):  
Kun Zeng ◽  
Zhiwei Pan ◽  
Yibin Xu ◽  
Yingying Qu

Background Eligibility criteria are the main strategy for screening appropriate participants for clinical trials. Automatic analysis of clinical trial eligibility criteria by digital screening, leveraging natural language processing techniques, can improve recruitment efficiency and reduce the costs involved in promoting clinical research. Objective We aimed to create a natural language processing model to automatically classify clinical trial eligibility criteria. Methods We proposed a classifier for short text eligibility criteria based on ensemble learning, where a set of pretrained models was integrated. The pretrained models included state-of-the-art deep learning methods for training and classification, including Bidirectional Encoder Representations from Transformers (BERT), XLNet, and A Robustly Optimized BERT Pretraining Approach (RoBERTa). The classification results by the integrated models were combined as new features for training a Light Gradient Boosting Machine (LightGBM) model for eligibility criteria classification. Results Our proposed method obtained an accuracy of 0.846, a precision of 0.803, and a recall of 0.817 on a standard data set from a shared task of an international conference. The macro F1 value was 0.807, outperforming the state-of-the-art baseline methods on the shared task. Conclusions We designed a model for screening short text classification criteria for clinical trials based on multimodel ensemble learning. Through experiments, we concluded that performance was improved significantly with a model ensemble compared to a single model. The introduction of focal loss could reduce the impact of class imbalance to achieve better performance.


Electronics ◽  
2021 ◽  
Vol 10 (7) ◽  
pp. 845
Author(s):  
Danbi Cho ◽  
Hyunyoung Lee ◽  
Seungshik Kang

It is important how the token unit is defined in a sentence in natural language process tasks, such as text classification, machine translation, and generation. Many studies recently utilized the subword tokenization in language models such as BERT, KoBERT, and ALBERT. Although these language models achieved state-of-the-art results in various NLP tasks, it is not clear whether the subword tokenization is the best token unit for Korean sentence embedding. Thus, we carried out sentence embedding based on word, morpheme, subword, and submorpheme, respectively, on Korean sentiment analysis. We explored the two-sentence representation methods for sentence embedding: considering the order of tokens in a sentence and not considering the order. While inputting a sentence, which is decomposed by token unit, to the two-sentence representation methods, we construct the sentence embedding with various tokenizations to find the most effective token unit for Korean sentence embedding. In our work, we confirmed: the robustness of the subword unit for out-of-vocabulary (OOV) problems compared to other token units, the disadvantage of replacing whitespace with a particular symbol in the sentiment analysis task, and that the optimal vocabulary size is 16K in subword and submorpheme tokenization. We empirically noticed that the subword, which was tokenized by a vocabulary size of 16K without replacement of whitespace, was the most effective for sentence embedding on the Korean sentiment analysis task.


Author(s):  
Julien Tissier ◽  
Christophe Gravier ◽  
Amaury Habrard

Word embeddings are commonly used as a starting point in many NLP models to achieve state-of-the-art performances. However, with a large vocabulary and many dimensions, these floating-point representations are expensive both in terms of memory and calculations which makes them unsuitable for use on low-resource devices. The method proposed in this paper transforms real-valued embeddings into binary embeddings while preserving semantic information, requiring only 128 or 256 bits for each vector. This leads to a small memory footprint and fast vector operations. The model is based on an autoencoder architecture, which also allows to reconstruct original vectors from the binary ones. Experimental results on semantic similarity, text classification and sentiment analysis tasks show that the binarization of word embeddings only leads to a loss of ∼2% in accuracy while vector size is reduced by 97%. Furthermore, a top-k benchmark demonstrates that using these binary vectors is 30 times faster than using real-valued vectors.


Author(s):  
Muhammad Zubair Asghar ◽  
Ikram Ullah ◽  
Shahab Shamshirband ◽  
Fazal Masud Kundi ◽  
Ammara Habib

The feedback collection and analysis has remained an important subject matter since long. The traditional techniques for student feedback analysis are based on questionnaire-based data collection and analysis. However, the student expresses their feedback opinions on online social media sites, which need to be analyzed. This study aims at the development of fuzzy-based sentiment analysis system for analyzing student feedback and satisfaction by assigning proper sentiment score to opinion words and polarity shifters present in the input reviews. Our technique computes the sentiment score of student feedback reviews and then applies fuzzy-logic module to analyze and quantify student’s satisfaction at the fine-grained level. The experimental results reveal that the proposed work has outperformed the baseline studies as well as state-of-the-art machine learning classifiers.


2020 ◽  
Author(s):  
Pathikkumar Patel ◽  
Bhargav Lad ◽  
Jinan Fiaidhi

During the last few years, RNN models have been extensively used and they have proven to be better for sequence and text data. RNNs have achieved state-of-the-art performance levels in several applications such as text classification, sequence to sequence modelling and time series forecasting. In this article we will review different Machine Learning and Deep Learning based approaches for text data and look at the results obtained from these methods. This work also explores the use of transfer learning in NLP and how it affects the performance of models on a specific application of sentiment analysis.


Algorithms ◽  
2021 ◽  
Vol 14 (7) ◽  
pp. 213
Author(s):  
Sharif Noor Zisad ◽  
Etu Chowdhury ◽  
Mohammad Shahadat Hossain ◽  
Raihan Ul Islam ◽  
Karl Andersson

Visual sentiment analysis has become more popular than textual ones in various domains for decision-making purposes. On account of this, we develop a visual sentiment analysis system, which can classify image expression. The system classifies images by taking into account six different expressions such as anger, joy, love, surprise, fear, and sadness. In our study, we propose an expert system by integrating a Deep Learning method with a Belief Rule Base (known as the BRB-DL approach) to assess an image’s overall sentiment under uncertainty. This BRB-DL approach includes both the data-driven and knowledge-driven techniques to determine the overall sentiment. Our integrated expert system outperforms the state-of-the-art methods of visual sentiment analysis with promising results. The integrated system can classify images with 86% accuracy. The system can be beneficial to understand the emotional tendency and psychological state of an individual.


2020 ◽  
Author(s):  
Kun Zeng ◽  
Zhiwei Pan ◽  
Yibin Xu ◽  
Yingying Qu

BACKGROUND Eligibility criteria are the main strategy for screening appropriate participants for clinical trials. Automatic analysis of clinical trial eligibility criteria by digital screening, leveraging natural language processing techniques, can improve recruitment efficiency and reduce the costs involved in promoting clinical research. OBJECTIVE We aimed to create a natural language processing model to automatically classify clinical trial eligibility criteria. METHODS We proposed a classifier for short text eligibility criteria based on ensemble learning, where a set of pretrained models was integrated. The pretrained models included state-of-the-art deep learning methods for training and classification, including Bidirectional Encoder Representations from Transformers (BERT), XLNet, and A Robustly Optimized BERT Pretraining Approach (RoBERTa). The classification results by the integrated models were combined as new features for training a Light Gradient Boosting Machine (LightGBM) model for eligibility criteria classification. RESULTS Our proposed method obtained an accuracy of 0.846, a precision of 0.803, and a recall of 0.817 on a standard data set from a shared task of an international conference. The macro F1 value was 0.807, outperforming the state-of-the-art baseline methods on the shared task. CONCLUSIONS We designed a model for screening short text classification criteria for clinical trials based on multimodel ensemble learning. Through experiments, we concluded that performance was improved significantly with a model ensemble compared to a single model. The introduction of focal loss could reduce the impact of class imbalance to achieve better performance.


Sign in / Sign up

Export Citation Format

Share Document