scholarly journals LDA-Based Topic Modeling Sentiment Analysis Using Topic/Document/Sentence (TDS) Model

2021 ◽  
Vol 11 (23) ◽  
pp. 11091
Author(s):  
Akhmedov Farkhod ◽  
Akmalbek Abdusalomov ◽  
Fazliddin Makhmudov ◽  
Young Im Cho

Customer reviews on the Internet reflect users’ sentiments about the product, service, and social events. As sentiments can be divided into positive, negative, and neutral forms, sentiment analysis processes identify the polarity of information in the source materials toward an entity. Most studies have focused on document-level sentiment classification. In this study, we apply an unsupervised machine learning approach to discover sentiment polarity not only at the document level but also at the word level. The proposed topic document sentence (TDS) model is based on joint sentiment topic (JST) and latent Dirichlet allocation (LDA) topic modeling techniques. The IMDB dataset, comprising user reviews, was used for data analysis. First, we applied the LDA model to discover topics from the reviews; then, the TDS model was implemented to identify the polarity of the sentiment from topic to document, and from document to word levels. The LDAvis tool was used for data visualization. The experimental results show that the analysis not only obtained good topic partitioning results, but also achieved high sentiment analysis accuracy in document- and word-level sentiment classifications.

2021 ◽  
Author(s):  
Shimon Ohtani

Abstract The importance of biodiversity conservation is gradually being recognized worldwide, and 2020 was the final year of the Aichi Biodiversity Targets formulated at the 10th Conference of the Parties to the Convention on Biological Diversity (COP10) in 2010. Unfortunately, the majority of the targets were assessed as unachievable. While it is essential to measure public awareness of biodiversity when setting the post-2020 targets, it is also a difficult task to propose a method to do so. This study provides a diachronic exploration of the discourse on “biodiversity” from 2010 to 2020, using Twitter posts, in combination with sentiment analysis and topic modeling, which are commonly used in data science. Through the aggregation and comparison of n-grams, the visualization of eight types of emotional tendencies using the NRC emotion lexicon, the construction of topic models using Latent Dirichlet allocation (LDA), and the qualitative analysis of tweet texts based on these models, I was able to classify and analyze unstructured tweets in a meaningful way. The results revealed the evolution of words used with “biodiversity” on Twitter over the past decade, the emotional tendencies behind the contexts in which “biodiversity” has been used, and the approximate content of tweet texts that have constituted topics with distinctive characteristics. While the search for people's awareness through SNS analysis still has many limitations, it is undeniable that important suggestions can be obtained. In order to further refine the research method, it will be essential to improve the skills of analysts and accumulate research examples as well as to advance data science.


2021 ◽  
Author(s):  
Adebayo Abayomi-Alli ◽  
Olusola Abayomi-Alli ◽  
Sanjay Misra ◽  
Luis Fernandez-Sanz

Abstract BackgroundSocial media opinion has become a medium to quickly access large, valuable, and rich details of information on any subject matter within a short period. Twitter being a social microblog site, generate over 330 million tweets monthly across different countries. Analyzing trending topics on Twitter presents opportunities to extract meaningful insight into different opinions on various issues.AimThis study aims to gain insights into the trending yahoo-yahoo topic on Twitter using content analysis of selected historical tweets.MethodologyThe widgets and workflow engine in the Orange Data mining toolbox were employed for all the text mining tasks. 5500 tweets were collected from Twitter using the 'yahoo yahoo' hashtag. The corpus was pre-processed using a pre-trained tweet tokenizer, Valence Aware Dictionary for Sentiment Reasoning (VADER) was used for the sentiment and opinion mining, Latent Dirichlet Allocation (LDA) and Latent Semantic Indexing (LSI) was used for topic modeling. In contrast, Multidimensional scaling (MDS) was used to visualize the modeled topics. ResultsResults showed that "yahoo" appeared in the corpus 9555 times, 175 unique tweets were returned after duplicate removal. Contrary to expectation, Spain had the highest number of participants tweeting on the 'yahoo yahoo' topic within the period. The result of Vader sentiment analysis returned 35.85%, 24.53%, 15.09%, and 24.53%, negative, neutral, no-zone, and positive sentiment tweets, respectively. The word yahoo was highly representative of the LDA topics 1, 3, 4, 6, and LSI topic 1.ConclusionIt can be concluded that emojis are even more representative of the sentiments in tweets faster than the textual contents. Also, despite popular belief, a significant number of youths regard cybercrime as a detriment to society.


2020 ◽  
pp. 1-10
Author(s):  
Junegak Joung ◽  
Harrison M. Kim

Abstract Identifying product attributes from the perspective of a customer is essential to measure the satisfaction, importance, and Kano category of each product attribute for product design. This paper proposes automated keyword filtering to identify product attributes from online customer reviews based on latent Dirichlet allocation. The preprocessing for latent Dirichlet allocation is important because it affects the results of topic modeling; however, previous research performed latent Dirichlet allocation either without removing noise keywords or by manually eliminating them. The proposed method improves the preprocessing for latent Dirichlet allocation by conducting automated filtering to remove the noise keywords that are not related to the product. A case study of Android smartphones is performed to validate the proposed method. The performance of the latent Dirichlet allocation by the proposed method is compared to that of a previous method, and according to the latent Dirichlet allocation results, the former exhibits a higher performance than the latter.


2020 ◽  
Vol 12 (16) ◽  
pp. 6673 ◽  
Author(s):  
Kiattipoom Kiatkawsin ◽  
Ian Sutherland ◽  
Jin-Young Kim

Airbnb has emerged as a platform where unique accommodation options can be found. Due to the uniqueness of each accommodation unit and host combination, each listing offers a one-of-a-kind experience. As consumers increasingly rely on text reviews of other customers, managers are also increasingly gaining insight from customer reviews. Thus, this present study aimed to extract those insights from reviews using latent Dirichlet allocation, an unsupervised type of topic modeling that extracts latent discussion topics from text data. Findings of Hong Kong’s 185,695 and Singapore’s 93,571 Airbnb reviews, two long-term rival destinations, were compared. Hong Kong produced 12 total topics that can be categorized into four distinct groups whereas Singapore’s optimal number of topics was only five. Topics produced from both destinations covered the same range of attributes, but Hong Kong’s 12 topics provide a greater degree of precision to formulate managerial recommendations. While many topics are similar to established hotel attributes, topics related to the host and listing management are unique to the Airbnb experience. The findings also revealed keywords used when evaluating the experience that provide more insight beyond typical numeric ratings.


2019 ◽  
Vol 26 (12) ◽  
pp. 1466-1477 ◽  
Author(s):  
Alison E Fohner ◽  
John D Greene ◽  
Brian L Lawson ◽  
Jonathan H Chen ◽  
Patricia Kipnis ◽  
...  

Abstract Objective To use unsupervised topic modeling to evaluate heterogeneity in sepsis treatment patterns contained within granular data of electronic health records. Materials and Methods A multicenter, retrospective cohort study of 29 253 hospitalized adult sepsis patients between 2010 and 2013 in Northern California. We applied an unsupervised machine learning method, Latent Dirichlet Allocation, to the orders, medications, and procedures recorded in the electronic health record within the first 24 hours of each patient’s hospitalization to uncover empiric treatment topics across the cohort and to develop computable clinical signatures for each patient based on proportions of these topics. We evaluated how these topics correlated with common sepsis treatment and outcome metrics including inpatient mortality, time to first antibiotic, and fluids given within 24 hours. Results Mean age was 70 ± 17 years with hospital mortality of 9.6%. We empirically identified 42 clinically recognizable treatment topics (eg, pneumonia, cellulitis, wound care, shock). Only 43.1% of hospitalizations had a single dominant topic, and a small minority (7.3%) had a single topic comprising at least 80% of their overall clinical signature. Across the entire sepsis cohort, clinical signatures were highly variable. Discussion Heterogeneity in sepsis is a major barrier to improving targeted treatments, yet existing approaches to characterizing clinical heterogeneity are narrowly defined. A machine learning approach captured substantial patient- and population-level heterogeneity in treatment during early sepsis hospitalization. Conclusion Using topic modeling based on treatment patterns may enable more precise clinical characterization in sepsis and better understanding of variability in sepsis presentation and outcomes.


2021 ◽  
Vol 5 (1) ◽  
pp. 24 ◽  
Author(s):  
Chairullah Naury ◽  
Dhomas Hatta Fudholi ◽  
Ahmad Fathan Hidayatullah

The online mass media is the source of the fastest and up-to-date information. A model that can provide mapping will help in sorting out information more precisely. In this study, the authors applied topic modeling to the results of sentiment analysis on online news headlines in Indonesian. Sources of data in this study were obtained from online mass media in Indonesian. The data collected were analyzed for sentiment using the Long Short-term Memory (LSTM) method, in order to obtain news headlines with positive, negative, and neutral sentiments. The classification obtained from the results of the sentiment analysis process is continued with the topic modeling process using the Latent Dirichlet Allocation (LDA) method and visualized in the form of wordcloud and intertopic distance map (pyLDAVis) to determine the relationship between one topic and another. The result of sentiment analysis is a model with 71.13% of accuracy level and the results of topic modeling are in the form of some topics that are easy to interpret.


2020 ◽  
Vol 0 (0) ◽  
Author(s):  
Yi Zhao ◽  
Haixu Xi ◽  
Chengzhi Zhang

AbstractCoronavirus disease 2019 (COVID-19) pandemic-related information are flooded on social media, and analyzing this information from an occupational perspective can help us to understand the social implications of this unprecedented disruption. In this study, using a COVID-19-related dataset collected with the Twitter IDs, we conduct topic and sentiment analysis from the perspective of occupation, by leveraging Latent Dirichlet Allocation (LDA) topic modeling and Valence Aware Dictionary and sEntiment Reasoning (VADER) model, respectively. The experimental results indicate that there are significant topic preference differences between Twitter users with different occupations. However, occupation-linked affective differences are only partly demonstrated in our study; Twitter users with different income levels have nothing to do with sentiment expression on covid-19-related topics.


2018 ◽  
Vol 29 (1) ◽  
pp. 1166-1178
Author(s):  
V.S. Anoop ◽  
S. Asharaf

Abstract Because of exponential growth in the number of people who purchase products online, e-commerce organizations are vying for each other to offer innovative and improved services to its customers. Current platforms give its customers innovative services such as product recommendations based on their purchase histories and location, product comparison, and most importantly, a platform for expressing their experience and feedback. It is important for any e-commerce organization to analyze this feedback and to find out the sentiment of the customers for giving them better products and services. As large reviews may contain feedback in a mixed manner where a customer gives his opinion on different product features in the same review, finding out the exact sentiment is tedious. This work proposes aspect-specific sentiment analysis of product reviews using a well-sophisticated topic modeling algorithm called latent Dirichlet allocation (LDA). The topic words, thus, extracted are mapped with various aspects of an entity to perform the aspect-specific sentiment analysis on product reviews. Experiments with synthetic and real dataset show promising results compared to existing methods of sentiment analysis.


2019 ◽  
Vol 142 (1) ◽  
Author(s):  
Feng Zhou ◽  
Jackie Ayoub ◽  
Qianli Xu ◽  
X. Jessie Yang

Abstract Creating product ecosystems has been one of the strategic ways to enhance user experience and business advantages. Among many, customer needs analysis for product ecosystems is one of the most challenging tasks in creating a successful product ecosystem from both the perspectives of marketing research and product development. In this paper, we propose a machine-learning approach to customer needs analysis for product ecosystems by examining a large amount of online user-generated product reviews within a product ecosystem. First, we filtered out uninformative reviews from the informative reviews using a fastText technique. Then, we extract a variety of topics with regard to customer needs using a topic modeling technique named latent Dirichlet allocation. In addition, we applied a rule-based sentiment analysis method to predict not only the sentiment of the reviews but also their sentiment intensity values. Finally, we categorized customer needs related to different topics extracted using an analytic Kano model based on the dissatisfaction-satisfaction pair from the sentiment analysis. A case example of the Amazon product ecosystem was used to illustrate the potential and feasibility of the proposed method.


Sign in / Sign up

Export Citation Format

Share Document