scholarly journals Enhancement Bag-of-Words Model for Solving the Challenges of Sentiment Analysis

Author(s):  
Doaa Mohey
Lenguaje ◽  
2019 ◽  
Vol 47 (2) ◽  
pp. 235-267
Author(s):  
Antonio Tamayo ◽  
Julián Arias Londoño ◽  
Diego Burgos ◽  
Gabriel Quiroz

The automatic prediction of the course of action of agents involved in social or economic trends is an imperative challenge nowadays. However, it is a difficult task because stance or opinion is often spread throughout long, complex texts, such as news articles. The current study tests sentence predicates as features to automatically determine the writer’s stance in news articles. We capture the semantics and stance of the text by encoding features such as the attribute of copulative sentences, the predicate of transitive sentences, adjectival phrases, and the section of the article. Under the assumption that these features are informative enough to model the semantics of the text, each word sequence is disambiguated and assigned a sentiment value using weighting rules. Different experiments were run using either SentiWordNet and ML-Senticon to determine words’ sentiment. Feature vectors are automatically built to populate a database that is tested using two machine learning algorithms. An efficiency of 69% was achieved using a SVM with Gaussian kernel along with a feature selection strategy. This score outperformed the bag-of-words baseline in 12%. These results are promising considering that the sentiment analysis is performed on very complex texts written in Spanish.


2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Sayeh Bagherzadeh ◽  
Sajjad Shokouhyar ◽  
Hamed Jahani ◽  
Marianna Sigala

Purpose Research analyzing online travelers’ reviews has boomed over the past years, but it lacks efficient methodologies that can provide useful end-user value within time and budget. This study aims to contribute to the field by developing and testing a new methodology for sentiment analysis that surpasses the standard dictionary-based method by creating two hotel-specific word lexicons. Design/methodology/approach Big data of hotel customer reviews posted on the TripAdvisor platform were collected and appropriately prepared for conducting a binary sentiment analysis by developing a novel bag-of-words weighted approach. The latter provides a transparent and replicable procedure to prepare, create and assess lexicons for sentiment analysis. This approach resulted in two lexicons (a weighted lexicon, L1 and a manually selected lexicon, L2), which were tested and validated by applying classification accuracy metrics to the TripAdvisor big data. Two popular methodologies (a public dictionary-based method and a complex machine-learning algorithm) were used for comparing the accuracy metrics of the study’s approach for creating the two lexicons. Findings The results of the accuracy metrics confirmed that the study’s methodology significantly outperforms the dictionary-based method in comparison to the machine-learning algorithm method. The findings also provide evidence that the study’s methodology is generalizable for predicting users’ sentiment. Practical implications The study developed and validated a methodology for generating reliable lexicons that can be used for big data analysis aiming to understand and predict customers’ sentiment. The L2 hotel dictionary generated by the study provides a reliable method and a useful tool for analyzing guests’ feedback and enabling managers to understand, anticipate and re-actively respond to customers’ attitudes and changes. The study also proposed a simplified methodology for understanding the sentiment of each user, which, in turn, can be used for conducting comparisons aiming to detect and understand guests’ sentiment changes across time, as well as across users based on their profiles and experiences. Originality/value This study contributes to the field by proposing and testing a new methodology for conducting sentiment analysis that addresses previous methodological limitations, as well as the contextual specificities of the tourism industry. Based on the paper’s literature review, this is the first research study using a bag-of-words approach for conducting a sentiment analysis and creating a field-specific lexicon.


Author(s):  
Nicholas Cummins ◽  
Shahin Amiriparian ◽  
Sandra Ottl ◽  
Maurice Gerczuk ◽  
Maximilian Schmitt ◽  
...  

2012 ◽  
Vol 263-266 ◽  
pp. 3330-3334
Author(s):  
Pan Pan Xu ◽  
Hui Lan Jin ◽  
Han Xiao Shi ◽  
Wei Chen

Existing research focuses on document-based sentiment analysis and documents are represented by the bag-of-words model. However, due to the loss of contextual information, this representation fails to capture the associative information between an opinion and its corresponding target. Additionally, several researchers focus on sentence-based approaches, which can effectively extract an aspect-sentiment word pair within one sentence. Nevertheless, their approaches can only deal with one aspect within one sentence and miss the identification of sentiment modifier. In order to solve these problems, this paper proposes a novel identification approach of aspect-modifier-sentiment word triple using shallow semantic information. Experimental results show that our approach is feasible and effective.


Author(s):  
Ensaf Hussein Mohamed ◽  
Mohammed ElSaid Moussa ◽  
Mohamed Hassan Haggag

Sentiment analysis (SA) is a technique that lets people in different fields such as business, economy, research, government, and politics to know about people’s opinions, which greatly affects the process of decision-making. SA techniques are classified into: lexicon-based techniques, machine learning techniques, and a hybrid between both approaches. Each approach has its limitations and drawbacks, the machine learning approach depends on manual feature extraction, lexicon-based approach relies on sentiment lexicons that are usually unscalable, unreliable, and manually annotated by human experts. Nowadays, word-embedding techniques have been commonly used in SA classification. Currently, Word2Vec and GloVe are some of the most accurate and usable word embedding techniques, which can transform words into meaningful semantic vectors. However, these techniques ignore sentiment information of texts and require a huge corpus of texts for training and generating accurate vectors, which are used as inputs of deep learning models. In this paper, we propose an enhanced ensemble classifier framework. Our framework is based on our previously published lexicon-based method, bag-of-words, and pre-trained word embedding, first the sentence is preprocessed by removing stop-words, POS tagging, stemming and lemmatization, shortening exaggerated word. Second, the processed sentence is passed to three modules, our previous lexicon-based method (Sum Votes), bag-of-words module and semantic module (Word2Vec and Glove) and produced feature vectors. Finally, the previous features vectors are fed into 11 different classifiers. The proposed framework is tested and evaluated over four datasets with five different lexicons, the experiment results show that our proposed model outperforms the previous lexicon based and the machine learning methods individually.


Sentiment analysis is the process of extracting the opinion expressed in a piece of text to determine the writer’s attitude towards a topic, product or any service in general and classify it into classes such as positive, negative or neutral. Bag of Words is the traditional approach for text representation in Sentiment Analysis where text is represented as bag of its words. This approach represents the text by breaking the sentence into words disregarding other semantic information. A problem that occurs due to this representation is Polarity Shift problem. To address polarity shift problem a dual sentiment analysis (DSA) system is created. It looks at the reviews from both the sides i.e. positive and negative. The existing work on dual sentiment analysis includes techniques where dual training and dual prediction is performed. The proposed system is to enhance the classification performance of the existing system by applying different classifiers apart from those used in existing system to obtain better results. After classification of reviews into appropriate classes, various graphs are plotted based on different parameters to validate the results and determine the best classifier from the applied classifiers.


Sign in / Sign up

Export Citation Format

Share Document