Enhancement Bag-of-Words Model for Solving the Challenges of Sentiment Analysis

Doaa Mohey

doi:10.14569/ijacsa.2016.070134

Sentiment Analysis of News Articles in Spanish using Predicate Features

Lenguaje ◽

10.25100/lenguaje.v47i2.7937 ◽

2019 ◽

Vol 47 (2) ◽

pp. 235-267

Author(s):

Antonio Tamayo ◽

Julián Arias Londoño ◽

Diego Burgos ◽

Gabriel Quiroz

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Sentiment Analysis ◽

Machine Learning Algorithms ◽

Gaussian Kernel ◽

Selection Strategy ◽

Bag Of Words ◽

Word Sequence ◽

Course Of Action ◽

Economic Trends

The automatic prediction of the course of action of agents involved in social or economic trends is an imperative challenge nowadays. However, it is a difficult task because stance or opinion is often spread throughout long, complex texts, such as news articles. The current study tests sentence predicates as features to automatically determine the writer’s stance in news articles. We capture the semantics and stance of the text by encoding features such as the attribute of copulative sentences, the predicate of transitive sentences, adjectival phrases, and the section of the article. Under the assumption that these features are informative enough to model the semantics of the text, each word sequence is disambiguated and assigned a sentiment value using weighting rules. Different experiments were run using either SentiWordNet and ML-Senticon to determine words’ sentiment. Feature vectors are automatically built to populate a database that is tested using two machine learning algorithms. An efficiency of 69% was achieved using a SVM with Gaussian kernel along with a feature selection strategy. This score outperformed the bag-of-words baseline in 12%. These results are promising considering that the sentiment analysis is performed on very complex texts written in Spanish.

Download Full-text

A generalizable sentiment analysis method for creating a hotel dictionary: using big data on TripAdvisor hotel reviews

Journal of Hospitality and Tourism Technology ◽

10.1108/jhtt-02-2020-0034 ◽

2021 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Sayeh Bagherzadeh ◽

Sajjad Shokouhyar ◽

Hamed Jahani ◽

Marianna Sigala

Keyword(s):

Machine Learning ◽

Big Data ◽

Sentiment Analysis ◽

Learning Algorithm ◽

Tourism Industry ◽

Bag Of Words ◽

Machine Learning Algorithm ◽

End User ◽

Content Type ◽

Customer Reviews

Purpose Research analyzing online travelers’ reviews has boomed over the past years, but it lacks efficient methodologies that can provide useful end-user value within time and budget. This study aims to contribute to the field by developing and testing a new methodology for sentiment analysis that surpasses the standard dictionary-based method by creating two hotel-specific word lexicons. Design/methodology/approach Big data of hotel customer reviews posted on the TripAdvisor platform were collected and appropriately prepared for conducting a binary sentiment analysis by developing a novel bag-of-words weighted approach. The latter provides a transparent and replicable procedure to prepare, create and assess lexicons for sentiment analysis. This approach resulted in two lexicons (a weighted lexicon, L1 and a manually selected lexicon, L2), which were tested and validated by applying classification accuracy metrics to the TripAdvisor big data. Two popular methodologies (a public dictionary-based method and a complex machine-learning algorithm) were used for comparing the accuracy metrics of the study’s approach for creating the two lexicons. Findings The results of the accuracy metrics confirmed that the study’s methodology significantly outperforms the dictionary-based method in comparison to the machine-learning algorithm method. The findings also provide evidence that the study’s methodology is generalizable for predicting users’ sentiment. Practical implications The study developed and validated a methodology for generating reliable lexicons that can be used for big data analysis aiming to understand and predict customers’ sentiment. The L2 hotel dictionary generated by the study provides a reliable method and a useful tool for analyzing guests’ feedback and enabling managers to understand, anticipate and re-actively respond to customers’ attitudes and changes. The study also proposed a simplified methodology for understanding the sentiment of each user, which, in turn, can be used for conducting comparisons aiming to detect and understand guests’ sentiment changes across time, as well as across users based on their profiles and experiences. Originality/value This study contributes to the field by proposing and testing a new methodology for conducting sentiment analysis that addresses previous methodological limitations, as well as the contextual specificities of the tourism industry. Based on the paper’s literature review, this is the first research study using a bag-of-words approach for conducting a sentiment analysis and creating a field-specific lexicon.

Download Full-text

Multimodal Bag-of-Words for Cross Domains Sentiment Analysis

2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) ◽

10.1109/icassp.2018.8462660 ◽

2018 ◽

Cited By ~ 10

Author(s):

Nicholas Cummins ◽

Shahin Amiriparian ◽

Sandra Ottl ◽

Maurice Gerczuk ◽

Maximilian Schmitt ◽

...

Keyword(s):

Sentiment Analysis ◽

Bag Of Words

Download Full-text

Enhanced bag-of-words model for phrase-level sentiment analysis

2014 14th International Conference on Advances in ICT for Emerging Regions (ICTer) ◽

10.1109/icter.2014.7083903 ◽

2014 ◽

Cited By ~ 1

Author(s):

Buddhika H. Kasthuriarachchy ◽

Kasun De Zoysa ◽

H.L. Premaratne

Keyword(s):

Sentiment Analysis ◽

Bag Of Words

Download Full-text

An Unsupervised Sentiment Information Identification Approach

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.263-266.3330 ◽

2012 ◽

Vol 263-266 ◽

pp. 3330-3334

Author(s):

Pan Pan Xu ◽

Hui Lan Jin ◽

Han Xiao Shi ◽

Wei Chen

Keyword(s):

Sentiment Analysis ◽

Word Pair ◽

Semantic Information ◽

Contextual Information ◽

Experimental Results ◽

Bag Of Words ◽

Associative Information ◽

Identification Approach ◽

Sentiment Word

Existing research focuses on document-based sentiment analysis and documents are represented by the bag-of-words model. However, due to the loss of contextual information, this representation fails to capture the associative information between an opinion and its corresponding target. Additionally, several researchers focus on sentence-based approaches, which can effectively extract an aspect-sentiment word pair within one sentence. Nevertheless, their approaches can only deal with one aspect within one sentence and miss the identification of sentiment modifier. In order to solve these problems, this paper proposes a novel identification approach of aspect-modifier-sentiment word triple using shallow semantic information. Experimental results show that our approach is feasible and effective.

Download Full-text

An Enhanced Sentiment Analysis Framework Based on Pre-Trained Word Embedding

International Journal of Computational Intelligence and Applications ◽

10.1142/s1469026820500315 ◽

2020 ◽

Vol 19 (04) ◽

pp. 2050031 ◽

Cited By ~ 1

Author(s):

Ensaf Hussein Mohamed ◽

Mohammed ElSaid Moussa ◽

Mohamed Hassan Haggag

Keyword(s):

Machine Learning ◽

Sentiment Analysis ◽

Ensemble Classifier ◽

Word Embedding ◽

Machine Learning Techniques ◽

Bag Of Words ◽

Pos Tagging ◽

Learning Techniques ◽

Proposed Model ◽

Machine Learning Approach

Sentiment analysis (SA) is a technique that lets people in different fields such as business, economy, research, government, and politics to know about people’s opinions, which greatly affects the process of decision-making. SA techniques are classified into: lexicon-based techniques, machine learning techniques, and a hybrid between both approaches. Each approach has its limitations and drawbacks, the machine learning approach depends on manual feature extraction, lexicon-based approach relies on sentiment lexicons that are usually unscalable, unreliable, and manually annotated by human experts. Nowadays, word-embedding techniques have been commonly used in SA classification. Currently, Word2Vec and GloVe are some of the most accurate and usable word embedding techniques, which can transform words into meaningful semantic vectors. However, these techniques ignore sentiment information of texts and require a huge corpus of texts for training and generating accurate vectors, which are used as inputs of deep learning models. In this paper, we propose an enhanced ensemble classifier framework. Our framework is based on our previously published lexicon-based method, bag-of-words, and pre-trained word embedding, first the sentence is preprocessed by removing stop-words, POS tagging, stemming and lemmatization, shortening exaggerated word. Second, the processed sentence is passed to three modules, our previous lexicon-based method (Sum Votes), bag-of-words module and semantic module (Word2Vec and Glove) and produced feature vectors. Finally, the previous features vectors are fed into 11 different classifiers. The proposed framework is tested and evaluated over four datasets with five different lexicons, the experiment results show that our proposed model outperforms the previous lexicon based and the machine learning methods individually.

Download Full-text

Review-Based Sentiment Prediction of Rating Using Natural Language Processing Sentence-Level Sentiment Analysis with Bag-of-Words Approach

First International Conference on Sustainable Technologies for Computational Intelligence - Advances in Intelligent Systems and Computing ◽

10.1007/978-981-15-0029-9_63 ◽

2019 ◽

pp. 807-821

Author(s):

K. Venkata Raju ◽

M. Sridhar

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Sentiment Analysis ◽

Language Processing ◽

Bag Of Words ◽

Sentence Level

Download Full-text

An Enhanced Technique for Analyzing Sentiments of Public Reviews - I

International Journal of Inventive Engineering and Sciences - Regular Issue ◽

10.35940/ijies.d0926.095619 ◽

2019 ◽

Vol 5 (6) ◽

pp. 1-6

Keyword(s):

Sentiment Analysis ◽

Semantic Information ◽

Traditional Approach ◽

Classification Performance ◽

Bag Of Words ◽

Text Representation ◽

Shift Problem

Sentiment analysis is the process of extracting the opinion expressed in a piece of text to determine the writer’s attitude towards a topic, product or any service in general and classify it into classes such as positive, negative or neutral. Bag of Words is the traditional approach for text representation in Sentiment Analysis where text is represented as bag of its words. This approach represents the text by breaking the sentence into words disregarding other semantic information. A problem that occurs due to this representation is Polarity Shift problem. To address polarity shift problem a dual sentiment analysis (DSA) system is created. It looks at the reviews from both the sides i.e. positive and negative. The existing work on dual sentiment analysis includes techniques where dual training and dual prediction is performed. The proposed system is to enhance the classification performance of the existing system by applying different classifiers apart from those used in existing system to obtain better results. After classification of reviews into appropriate classes, various graphs are plotted based on different parameters to validate the results and determine the best classifier from the applied classifiers.

Download Full-text