scholarly journals Feature Weighting Based on Inter-Category and Intra-Category Strength for Twitter Sentiment Analysis

2018 ◽  
Vol 9 (1) ◽  
pp. 92 ◽  
Author(s):  
Yili Wang ◽  
Hee Yong Youn

The rapid growth in social networking services has led to the generation of a massivevolume of opinionated information in the form of electronic text. As a result, the research on textsentiment analysis has drawn a great deal of interest. In this paper a novel feature weighting approachis proposed for the sentiment analysis of Twitter data. It properly measures the relative significanceof each feature regarding both intra-category and intra-category distribution. A new statistical modelcalled Category Discriminative Strength is introduced to characterize the discriminability of thefeatures among various categories, and a modified Chi-square (2)-based measure is employed tomeasure the intra-category dependency of the features. Moreover, a fine-grained feature clusteringstrategy is proposed to maximize the accuracy of the analysis. Extensive experiments demonstrate thatthe proposed approach significantly outperforms four state-of-the-art sentiment analysis techniquesin terms of accuracy, precision, recall, and F1 measure with various sizes and patterns of training andtest datasets.

2020 ◽  
Vol 34 (05) ◽  
pp. 8600-8607
Author(s):  
Haiyun Peng ◽  
Lu Xu ◽  
Lidong Bing ◽  
Fei Huang ◽  
Wei Lu ◽  
...  

Target-based sentiment analysis or aspect-based sentiment analysis (ABSA) refers to addressing various sentiment analysis tasks at a fine-grained level, which includes but is not limited to aspect extraction, aspect sentiment classification, and opinion extraction. There exist many solvers of the above individual subtasks or a combination of two subtasks, and they can work together to tell a complete story, i.e. the discussed aspect, the sentiment on it, and the cause of the sentiment. However, no previous ABSA research tried to provide a complete solution in one shot. In this paper, we introduce a new subtask under ABSA, named aspect sentiment triplet extraction (ASTE). Particularly, a solver of this task needs to extract triplets (What, How, Why) from the inputs, which show WHAT the targeted aspects are, HOW their sentiment polarities are and WHY they have such polarities (i.e. opinion reasons). For instance, one triplet from “Waiters are very friendly and the pasta is simply average” could be (‘Waiters’, positive, ‘friendly’). We propose a two-stage framework to address this task. The first stage predicts what, how and why in a unified model, and then the second stage pairs up the predicted what (how) and why from the first stage to output triplets. In the experiments, our framework has set a benchmark performance in this novel triplet extraction task. Meanwhile, it outperforms a few strong baselines adapted from state-of-the-art related methods.


Author(s):  
Vishnu VardanReddy ◽  
Mahesh Maila ◽  
Sai Sri Raghava ◽  
Yashwanth Avvaru ◽  
Sri. V. Koteswarao

In recent years, there is a rapid growth in online communication. There are many social networking sites and related mobile applications, and some more are still emerging. Huge amount of data is generated by these sites everyday and this data can be used as a source for various analysis purposes. Twitter is one of the most popular networking sites with millions of users. There are users with different views and varieties of reviews in the form of tweets are generated by them. Nowadays Opinion Mining has become an emerging topic of research due to lot of opinionated data available on Blogs & social networking sites. Tracking different types of opinions & summarizing them can provide valuable insight to different types of opinions to users who use Social networking sites to get reviews about any product, service or any topic. Analysis of opinions & its classification on the basis of polarity (positive, negative, neutral) is a challenging task. Lot of work has been done on sentiment analysis of twitter data and lot needs to be done. In this paper we discuss the levels, approaches of sentiment analysis, sentiment analysis of twitter data, existing tools available for sentiment analysis and the steps involved for same. Two approaches are discussed with an example which works on machine learning and lexicon based respectively.


2017 ◽  
Vol 5 ◽  
pp. 179-189 ◽  
Author(s):  
Ryo Fujii ◽  
Ryo Domoto ◽  
Daichi Mochihashi

This paper presents a novel hybrid generative/discriminative model of word segmentation based on nonparametric Bayesian methods. Unlike ordinary discriminative word segmentation which relies only on labeled data, our semi-supervised model also leverages a huge amounts of unlabeled text to automatically learn new “words”, and further constrains them by using a labeled data to segment non-standard texts such as those found in social networking services. Specifically, our hybrid model combines a discriminative classifier (CRF; Lafferty et al. (2001) and unsupervised word segmentation (NPYLM; Mochihashi et al. (2009)), with a transparent exchange of information between these two model structures within the semi-supervised framework (JESS-CM; Suzuki and Isozaki (2008)). We confirmed that it can appropriately segment non-standard texts like those in Twitter and Weibo and has nearly state-of-the-art accuracy on standard datasets in Japanese, Chinese, and Thai.


Author(s):  
Sanjiban Sekhar Roy ◽  
Marenglen Biba ◽  
Rohan Kumar ◽  
Rahul Kumar ◽  
Pijush Samui

Online social networking platforms, such as Weblogs, micro blogs, and social networks are intensively being utilized daily to express individual's thinking. This permits scientists to collect huge amounts of data and extract significant knowledge regarding the sentiments of a large number of people at a scale that was essentially impractical a couple of years back. Therefore, these days, sentiment analysis has the potential to learn sentiments towards persons, object and occasions. Twitter has increasingly become a significant social networking platform where people post messages of up to 140 characters known as ‘Tweets'. Tweets have become the preferred medium for the marketing sector as users can instantly indicate customer success or indicate public relations disaster far more quickly than a web page or traditional media does. In this paper, we have analyzed twitter data and have predicted positive and negative tweets with high accuracy rate using support vector machine (SVM).


2020 ◽  
pp. 1-29
Author(s):  
Cem Rıfkı Aydın ◽  
Tunga Güngör

Abstract Although many studies on sentiment analysis have been carried out for widely spoken languages, this topic is still immature for Turkish. Most of the works in this language focus on supervised models, which necessitate comprehensive annotated corpora. There are a few unsupervised methods, and they utilize sentiment lexicons either built by translating from English lexicons or created based on corpora. This results in improper word polarities as the language and domain characteristics are ignored. In this paper, we develop unsupervised (domain-independent) and semi-supervised (domain-specific) methods for Turkish, which are based on a set of antonym word pairs as seeds. We make a comprehensive analysis of supervised methods under several feature weighting schemes. We then form ensemble of supervised classifiers and also combine the unsupervised and supervised methods. Since Turkish is an agglutinative language, we perform morphological analysis and use different word forms. The methods developed were tested on two datasets having different styles in Turkish and also on datasets in English to show the portability of the approaches across languages. We observed that the combination of the unsupervised and supervised approaches outperforms the other methods, and we obtained a significant improvement over the state-of-the-art results for both Turkish and English.


Author(s):  
Xiangying Ran ◽  
Yuanyuan Pan ◽  
Wei Sun ◽  
Chongjun Wang

Aspect-based sentiment analysis (ABSA) is a fine-grained task. Recurrent Neural Network (RNN) model armed with attention mechanism seems a natural fit for this task, and actually it achieves the state-of-the-art performance recently. However, previous attention mechanisms proposed for ABSA may attend irrelevant words and thus downgrade the performance, especially when dealing with long and complex sentences with multiple aspects. In this paper, we propose a novel architecture named Hierarchical Gate Memory Network (HGMN) for ABSA: firstly, we employ the proposed hierarchical gate mechanism to learn to select the related part about the given aspect, which can keep the original sequence structure of sentence at the same time. After that, we apply Convolutional Neural Network (CNN) on the final aspect-specific memory. We conduct extensive experiments on the SemEval 2014 and Twitter dataset, and results demonstrate that our model outperforms attention based state-of-the-art baselines.


2019 ◽  
Vol 6 (1) ◽  
pp. 138-149
Author(s):  
Ukhti Ikhsani Larasati ◽  
Much Aziz Muslim ◽  
Riza Arifudin ◽  
Alamsyah Alamsyah

Data processing can be done with text mining techniques. To process large text data is required a machine to explore opinions, including positive or negative opinions. Sentiment analysis is a process that applies text mining methods. Sentiment analysis is a process that aims to determine the content of the dataset in the form of text is positive or negative. Support vector machine is one of the classification algorithms that can be used for sentiment analysis. However, support vector machine works less well on the large-sized data. In addition, in the text mining process there are constraints one is number of attributes used. With many attributes it will reduce the performance of the classifier so as to provide a low level of accuracy. The purpose of this research is to increase the support vector machine accuracy with implementation of feature selection and feature weighting. Feature selection will reduce a large number of irrelevant attributes. In this study the feature is selected based on the top value of K = 500. Once selected the relevant attributes are then performed feature weighting to calculate the weight of each attribute selected. The feature selection method used is chi square statistic and feature weighting using Term Frequency Inverse Document Frequency (TFIDF). Result of experiment using Matlab R2017b is integration of support vector machine with chi square statistic and TFIDF that uses 10 fold cross validation gives an increase of accuracy of 11.5% with the following explanation, the accuracy of the support vector machine without applying chi square statistic and TFIDF resulted in an accuracy of 68.7% and the accuracy of the support vector machine by applying chi square statistic and TFIDF resulted in an accuracy of 80.2%.


2019 ◽  
Vol 66 ◽  
Author(s):  
Jeremy Barnes ◽  
Roman Klinger

Sentiment analysis benefits from large, hand-annotated resources in order to train and test machine learning models, which are often data hungry. While some languages, e.g., English, have a vast arrayof these resources, most under-resourced languages do not, especially for fine-grained sentiment tasks, such as aspect-level or targeted sentiment analysis. To improve this situation, we propose a cross-lingual approach to sentiment analysis that is applicable to under-resourced languages and takes into account target-level information. This model incorporates sentiment information into bilingual distributional representations, byjointly optimizing them for semantics and sentiment, showing state-of-the-art performance at sentence-level when combined with machine translation. The adaptation to targeted sentiment analysis on multiple domains shows that our model outperforms other projection-based bilingual embedding methods on binary targetedsentiment tasks. Our analysis on ten languages demonstrates that the amount of unlabeled monolingual data has surprisingly little effect on the sentiment results. As expected, the choice of a annotated source language for projection to a target leads to better results for source-target language pairs which are similar. Therefore, our results suggest that more efforts should be spent on the creation of resources for less similar languages tothose which are resource-rich already. Finally, a domain mismatch leads to a decreased performance. This suggests resources in any language should ideally cover varieties of domains.


Author(s):  
Siyu Zhu ◽  
Jin Qi ◽  
Jie Hu ◽  
Haiqing Huang

Abstract With the increasing demand for a personalized product and rapid market response, many companies expect to explore online user-generated content (UGC) for intelligent customer hearing and product redesign strategy. UGC has the advantages of being more unbiased than traditional interviews, yielding in-time response, and widely accessible with a sheer volume. From online resources, customers’ preferences toward various aspects of the product can be exploited by promising sentiment analysis methods. However, due to the complexity of language, state-of-the-art sentiment analysis methods are still not accurate for practice use in product redesign. To tackle this problem, we propose an integrated customer hearing and product redesign system, which combines the robust use of sentiment analysis for customer hearing and coordinated redesign mechanisms. Ontology and expert knowledges are involved to promote the accuracy. Specifically, a fuzzy product ontology that contains domain knowledges is first learned in a semi-supervised way. Then, UGC is exploited with a novel ontology-based fine-grained sentiment analysis approach. Extracted customer preference statistics are transformed into multilevels, for the automatic establishment of opportunity landscapes and house of quality table. Besides, customer preference statistics are interactively visualized, through which representative customer feedbacks are concurrently generated. Through a case study of smartphone, the effectiveness of the proposed system is validated, and applicable redesign strategies for a case product are provided. With this system, information including customer preferences, user experiences, using habits and conditions can be exploited together for reliable product redesign strategy elicitation.


Author(s):  
Rajesh Bose ◽  
P. S. Aithal ◽  
Sandip Roy

Who does not know that Twitter is an august social networking podium now? Here the folks around the globe are able to establish their viewpoints. Every day, almost 500 million tweets are established in twitter, and this volume contains 8TB data. The data that we derive from twitter is very much significant if we illustrate it, because we are able to derive significant news in the mode of sentiment analysis. From Twitter data, we get to know about the information and remarks of the augmentation of various product, novel fashion etc. Exploration of emotions, viewpoints, subjectivity and motive from a normal message or tweet for the application of drugs for the therapy of COVID-19, is the prime objective of this sentiment analysis. Now, comes clustering. It’s a method by which one can detect homogeneous substances, combine together and create a class or cluster. There was an initiative from of going on with a research from our end and the result of it showed that the implementation of clustering is able to get infirm and solid positive or negative tweets while getting clustered with outcomes of distinct dictionaries and present robust scaffolding on our prediction. The research analyzes the polarity calculation, applying VADER sentiment analysis for the application of drug for the therapy of COVID-19.


Sign in / Sign up

Export Citation Format

Share Document