LDA-Based Topic Modeling Sentiment Analysis Using Topic/Document/Sentence (TDS) Model

Akhmedov Farkhod; Akmalbek Abdusalomov; Fazliddin Makhmudov; Young Im Cho

doi:10.3390/app112311091

LDA-Based Topic Modeling Sentiment Analysis Using Topic/Document/Sentence (TDS) Model

Applied Sciences ◽

10.3390/app112311091 ◽

2021 ◽

Vol 11 (23) ◽

pp. 11091

Author(s):

Akhmedov Farkhod ◽

Akmalbek Abdusalomov ◽

Fazliddin Makhmudov ◽

Young Im Cho

Keyword(s):

Sentiment Analysis ◽

Topic Modeling ◽

Latent Dirichlet Allocation ◽

Customer Reviews ◽

Word Level ◽

Social Events ◽

Product Service ◽

Machine Learning Approach ◽

Source Materials ◽

Document Level

Customer reviews on the Internet reflect users’ sentiments about the product, service, and social events. As sentiments can be divided into positive, negative, and neutral forms, sentiment analysis processes identify the polarity of information in the source materials toward an entity. Most studies have focused on document-level sentiment classification. In this study, we apply an unsupervised machine learning approach to discover sentiment polarity not only at the document level but also at the word level. The proposed topic document sentence (TDS) model is based on joint sentiment topic (JST) and latent Dirichlet allocation (LDA) topic modeling techniques. The IMDB dataset, comprising user reviews, was used for data analysis. First, we applied the LDA model to discover topics from the reviews; then, the TDS model was implemented to identify the polarity of the sentiment from topic to document, and from document to word levels. The LDAvis tool was used for data visualization. The experimental results show that the analysis not only obtained good topic partitioning results, but also achieved high sentiment analysis accuracy in document- and word-level sentiment classifications.

Download Full-text

How is People's Awareness of “Biodiversity” Measured ?Using Sentiment Analysis and LDA Topic Modeling in the Twitter Discourse Space from 2010 to 2020

10.21203/rs.3.rs-922908/v1 ◽

2021 ◽

Author(s):

Shimon Ohtani

Keyword(s):

Sentiment Analysis ◽

Topic Modeling ◽

Data Science ◽

Latent Dirichlet Allocation ◽

Biological Diversity ◽

Public Awareness ◽

Convention On Biological Diversity ◽

Emotion Lexicon ◽

Aichi Biodiversity Targets ◽

Do So

Abstract The importance of biodiversity conservation is gradually being recognized worldwide, and 2020 was the final year of the Aichi Biodiversity Targets formulated at the 10th Conference of the Parties to the Convention on Biological Diversity (COP10) in 2010. Unfortunately, the majority of the targets were assessed as unachievable. While it is essential to measure public awareness of biodiversity when setting the post-2020 targets, it is also a difficult task to propose a method to do so. This study provides a diachronic exploration of the discourse on “biodiversity” from 2010 to 2020, using Twitter posts, in combination with sentiment analysis and topic modeling, which are commonly used in data science. Through the aggregation and comparison of n-grams, the visualization of eight types of emotional tendencies using the NRC emotion lexicon, the construction of topic models using Latent Dirichlet allocation (LDA), and the qualitative analysis of tweet texts based on these models, I was able to classify and analyze unstructured tweets in a meaningful way. The results revealed the evolution of words used with “biodiversity” on Twitter over the past decade, the emotional tendencies behind the contexts in which “biodiversity” has been used, and the approximate content of tweet texts that have constituted topics with distinctive characteristics. While the search for people's awareness through SNS analysis still has many limitations, it is undeniable that important suggestions can be obtained. In order to further refine the research method, it will be essential to improve the skills of analysts and accumulate research examples as well as to advance data science.

Download Full-text

Topic Modeling Using Latent Dirichlet Allocation (LDA) and Sentiment Analysis for Marketing Planning Tiket.com

Proceedings of the 2nd International Seminar on Science and Technology (ISSTEC 2019) ◽

10.2991/assehr.k.201010.004 ◽

2020 ◽

Author(s):

Berlin Helmi Puspita ◽

Muhammad Muhajir ◽

Hafizhan Aliady

Keyword(s):

Sentiment Analysis ◽

Topic Modeling ◽

Latent Dirichlet Allocation ◽

Marketing Planning ◽

Dirichlet Allocation

Download Full-text

Study of the Yahoo-yahoo Hash-tag Tweets Using Sentiment Analysis and Opinion Mining Algorithms

10.21203/rs.3.rs-354801/v1 ◽

2021 ◽

Author(s):

Adebayo Abayomi-Alli ◽

Olusola Abayomi-Alli ◽

Sanjay Misra ◽

Luis Fernandez-Sanz

Keyword(s):

Sentiment Analysis ◽

Topic Modeling ◽

Latent Dirichlet Allocation ◽

Opinion Mining ◽

Latent Semantic Indexing ◽

Workflow Engine ◽

Semantic Indexing ◽

Short Period ◽

Mining Algorithms ◽

Insight Into

Abstract BackgroundSocial media opinion has become a medium to quickly access large, valuable, and rich details of information on any subject matter within a short period. Twitter being a social microblog site, generate over 330 million tweets monthly across different countries. Analyzing trending topics on Twitter presents opportunities to extract meaningful insight into different opinions on various issues.AimThis study aims to gain insights into the trending yahoo-yahoo topic on Twitter using content analysis of selected historical tweets.MethodologyThe widgets and workflow engine in the Orange Data mining toolbox were employed for all the text mining tasks. 5500 tweets were collected from Twitter using the 'yahoo yahoo' hashtag. The corpus was pre-processed using a pre-trained tweet tokenizer, Valence Aware Dictionary for Sentiment Reasoning (VADER) was used for the sentiment and opinion mining, Latent Dirichlet Allocation (LDA) and Latent Semantic Indexing (LSI) was used for topic modeling. In contrast, Multidimensional scaling (MDS) was used to visualize the modeled topics. ResultsResults showed that "yahoo" appeared in the corpus 9555 times, 175 unique tweets were returned after duplicate removal. Contrary to expectation, Spain had the highest number of participants tweeting on the 'yahoo yahoo' topic within the period. The result of Vader sentiment analysis returned 35.85%, 24.53%, 15.09%, and 24.53%, negative, neutral, no-zone, and positive sentiment tweets, respectively. The word yahoo was highly representative of the LDA topics 1, 3, 4, 6, and LSI topic 1.ConclusionIt can be concluded that emojis are even more representative of the sentiments in tweets faster than the textual contents. Also, despite popular belief, a significant number of youths regard cybercrime as a detriment to society.

Download Full-text

Automated Keyword Filtering in LDA for Identifying Product Attributes from Online Reviews

Journal of Mechanical Design ◽

10.1115/1.4048960 ◽

2020 ◽

pp. 1-10

Author(s):

Junegak Joung ◽

Harrison M. Kim

Keyword(s):

Product Design ◽

Topic Modeling ◽

Latent Dirichlet Allocation ◽

Online Reviews ◽

Previous Method ◽

Product Attributes ◽

Customer Reviews ◽

Online Customer Reviews ◽

Dirichlet Allocation

Abstract Identifying product attributes from the perspective of a customer is essential to measure the satisfaction, importance, and Kano category of each product attribute for product design. This paper proposes automated keyword filtering to identify product attributes from online customer reviews based on latent Dirichlet allocation. The preprocessing for latent Dirichlet allocation is important because it affects the results of topic modeling; however, previous research performed latent Dirichlet allocation either without removing noise keywords or by manually eliminating them. The proposed method improves the preprocessing for latent Dirichlet allocation by conducting automated filtering to remove the noise keywords that are not related to the product. A case study of Android smartphones is performed to validate the proposed method. The performance of the latent Dirichlet allocation by the proposed method is compared to that of a previous method, and according to the latent Dirichlet allocation results, the former exhibits a higher performance than the latter.

Download Full-text

A Comparative Automated Text Analysis of Airbnb Reviews in Hong Kong and Singapore Using Latent Dirichlet Allocation

Sustainability ◽

10.3390/su12166673 ◽

2020 ◽

Vol 12 (16) ◽

pp. 6673 ◽

Cited By ~ 1

Author(s):

Kiattipoom Kiatkawsin ◽

Ian Sutherland ◽

Jin-Young Kim

Keyword(s):

Hong Kong ◽

Text Analysis ◽

Topic Modeling ◽

Latent Dirichlet Allocation ◽

Optimal Number ◽

Text Data ◽

Customer Reviews ◽

Gaining Insight ◽

Dirichlet Allocation

Airbnb has emerged as a platform where unique accommodation options can be found. Due to the uniqueness of each accommodation unit and host combination, each listing offers a one-of-a-kind experience. As consumers increasingly rely on text reviews of other customers, managers are also increasingly gaining insight from customer reviews. Thus, this present study aimed to extract those insights from reviews using latent Dirichlet allocation, an unsupervised type of topic modeling that extracts latent discussion topics from text data. Findings of Hong Kong’s 185,695 and Singapore’s 93,571 Airbnb reviews, two long-term rival destinations, were compared. Hong Kong produced 12 total topics that can be categorized into four distinct groups whereas Singapore’s optimal number of topics was only five. Topics produced from both destinations covered the same range of attributes, but Hong Kong’s 12 topics provide a greater degree of precision to formulate managerial recommendations. While many topics are similar to established hotel attributes, topics related to the host and listing management are unique to the Airbnb experience. The findings also revealed keywords used when evaluating the experience that provide more insight beyond typical numeric ratings.

Download Full-text

Assessing clinical heterogeneity in sepsis through treatment patterns and machine learning

Journal of the American Medical Informatics Association ◽

10.1093/jamia/ocz106 ◽

2019 ◽

Vol 26 (12) ◽

pp. 1466-1477 ◽

Cited By ~ 7

Author(s):

Alison E Fohner ◽

John D Greene ◽

Brian L Lawson ◽

Jonathan H Chen ◽

Patricia Kipnis ◽

...

Keyword(s):

Machine Learning ◽

Topic Modeling ◽

Latent Dirichlet Allocation ◽

Wound Care ◽

Population Level ◽

Treatment Patterns ◽

Clinical Heterogeneity ◽

Major Barrier ◽

Machine Learning Approach ◽

Electronic Health

Abstract Objective To use unsupervised topic modeling to evaluate heterogeneity in sepsis treatment patterns contained within granular data of electronic health records. Materials and Methods A multicenter, retrospective cohort study of 29 253 hospitalized adult sepsis patients between 2010 and 2013 in Northern California. We applied an unsupervised machine learning method, Latent Dirichlet Allocation, to the orders, medications, and procedures recorded in the electronic health record within the first 24 hours of each patient’s hospitalization to uncover empiric treatment topics across the cohort and to develop computable clinical signatures for each patient based on proportions of these topics. We evaluated how these topics correlated with common sepsis treatment and outcome metrics including inpatient mortality, time to first antibiotic, and fluids given within 24 hours. Results Mean age was 70 ± 17 years with hospital mortality of 9.6%. We empirically identified 42 clinically recognizable treatment topics (eg, pneumonia, cellulitis, wound care, shock). Only 43.1% of hospitalizations had a single dominant topic, and a small minority (7.3%) had a single topic comprising at least 80% of their overall clinical signature. Across the entire sepsis cohort, clinical signatures were highly variable. Discussion Heterogeneity in sepsis is a major barrier to improving targeted treatments, yet existing approaches to characterizing clinical heterogeneity are narrowly defined. A machine learning approach captured substantial patient- and population-level heterogeneity in treatment during early sepsis hospitalization. Conclusion Using topic modeling based on treatment patterns may enable more precise clinical characterization in sepsis and better understanding of variability in sepsis presentation and outcomes.

Download Full-text

Topic Modelling pada Sentimen Terhadap Headline Berita Online Berbahasa Indonesia Menggunakan LDA dan LSTM

JURNAL MEDIA INFORMATIKA BUDIDARMA ◽

10.30865/mib.v5i1.2556 ◽

2021 ◽

Vol 5 (1) ◽

pp. 24 ◽

Cited By ~ 1

Author(s):

Chairullah Naury ◽

Dhomas Hatta Fudholi ◽

Ahmad Fathan Hidayatullah

Keyword(s):

Mass Media ◽

Sentiment Analysis ◽

Topic Modeling ◽

Latent Dirichlet Allocation ◽

Short Term Memory ◽

Online News ◽

Distance Map ◽

Analysis Process ◽

Modeling Process ◽

News Headlines

The online mass media is the source of the fastest and up-to-date information. A model that can provide mapping will help in sorting out information more precisely. In this study, the authors applied topic modeling to the results of sentiment analysis on online news headlines in Indonesian. Sources of data in this study were obtained from online mass media in Indonesian. The data collected were analyzed for sentiment using the Long Short-term Memory (LSTM) method, in order to obtain news headlines with positive, negative, and neutral sentiments. The classification obtained from the results of the sentiment analysis process is continued with the topic modeling process using the Latent Dirichlet Allocation (LDA) method and visualized in the form of wordcloud and intertopic distance map (pyLDAVis) to determine the relationship between one topic and another. The result of sentiment analysis is a model with 71.13% of accuracy level and the results of topic modeling are in the form of some topics that are easy to interpret.

Download Full-text

Exploring Occupation Differences in Reactions to COVID-19 Pandemic on Twitter

Data and Information Management ◽

10.2478/dim-2020-0032 ◽

2020 ◽

Vol 0 (0) ◽

Author(s):

Yi Zhao ◽

Haixu Xi ◽

Chengzhi Zhang

Keyword(s):

Social Media ◽

Sentiment Analysis ◽

Topic Modeling ◽

Latent Dirichlet Allocation ◽

Experimental Results ◽

Social Implications ◽

Income Levels ◽

Related Information ◽

The Social ◽

Twitter Users

AbstractCoronavirus disease 2019 (COVID-19) pandemic-related information are flooded on social media, and analyzing this information from an occupational perspective can help us to understand the social implications of this unprecedented disruption. In this study, using a COVID-19-related dataset collected with the Twitter IDs, we conduct topic and sentiment analysis from the perspective of occupation, by leveraging Latent Dirichlet Allocation (LDA) topic modeling and Valence Aware Dictionary and sEntiment Reasoning (VADER) model, respectively. The experimental results indicate that there are significant topic preference differences between Twitter users with different occupations. However, occupation-linked affective differences are only partly demonstrated in our study; Twitter users with different income levels have nothing to do with sentiment expression on covid-19-related topics.

Download Full-text

Aspect-Oriented Sentiment Analysis: A Topic Modeling-Powered Approach

Journal of Intelligent Systems ◽

10.1515/jisys-2018-0299 ◽

2018 ◽

Vol 29 (1) ◽

pp. 1166-1178

Author(s):

V.S. Anoop ◽

S. Asharaf

Keyword(s):

Sentiment Analysis ◽

Exponential Growth ◽

Topic Modeling ◽

Latent Dirichlet Allocation ◽

Product Reviews ◽

Real Dataset ◽

Product Features ◽

Product Comparison ◽

Review Finding ◽

Product Recommendations

Abstract Because of exponential growth in the number of people who purchase products online, e-commerce organizations are vying for each other to offer innovative and improved services to its customers. Current platforms give its customers innovative services such as product recommendations based on their purchase histories and location, product comparison, and most importantly, a platform for expressing their experience and feedback. It is important for any e-commerce organization to analyze this feedback and to find out the sentiment of the customers for giving them better products and services. As large reviews may contain feedback in a mixed manner where a customer gives his opinion on different product features in the same review, finding out the exact sentiment is tedious. This work proposes aspect-specific sentiment analysis of product reviews using a well-sophisticated topic modeling algorithm called latent Dirichlet allocation (LDA). The topic words, thus, extracted are mapped with various aspects of an entity to perform the aspect-specific sentiment analysis on product reviews. Experiments with synthetic and real dataset show promising results compared to existing methods of sentiment analysis.

Download Full-text

A Machine Learning Approach to Customer Needs Analysis for Product Ecosystems

Journal of Mechanical Design ◽

10.1115/1.4044435 ◽

2019 ◽

Vol 142 (1) ◽

Author(s):

Feng Zhou ◽

Jackie Ayoub ◽

Qianli Xu ◽

X. Jessie Yang

Keyword(s):

Machine Learning ◽

Sentiment Analysis ◽

Latent Dirichlet Allocation ◽

Marketing Research ◽

Needs Analysis ◽

Learning Approach ◽

Product Reviews ◽

Kano Model ◽

Customer Needs ◽

Machine Learning Approach

Abstract Creating product ecosystems has been one of the strategic ways to enhance user experience and business advantages. Among many, customer needs analysis for product ecosystems is one of the most challenging tasks in creating a successful product ecosystem from both the perspectives of marketing research and product development. In this paper, we propose a machine-learning approach to customer needs analysis for product ecosystems by examining a large amount of online user-generated product reviews within a product ecosystem. First, we filtered out uninformative reviews from the informative reviews using a fastText technique. Then, we extract a variety of topics with regard to customer needs using a topic modeling technique named latent Dirichlet allocation. In addition, we applied a rule-based sentiment analysis method to predict not only the sentiment of the reviews but also their sentiment intensity values. Finally, we categorized customer needs related to different topics extracted using an analytic Kano model based on the dissatisfaction-satisfaction pair from the sentiment analysis. A case example of the Amazon product ecosystem was used to illustrate the potential and feasibility of the proposed method.

Download Full-text