Integrating Topic, Sentiment, and Syntax for Modeling Online Reviews: A Topic Model Approach

Author(s):  
Min Tang ◽  
Jian Jin ◽  
Ying Liu ◽  
Chunping Li ◽  
Weiwen Zhang

Analyzing product online reviews has drawn much interest in the academic field. In this research, a new probabilistic topic model, called tag sentiment aspect models (TSA), is proposed on the basis of Latent Dirichlet allocation (LDA), which aims to reveal latent aspects and corresponding sentiment in a review simultaneously. Unlike other topic models which consider words in online reviews only, syntax tags are taken as visual information and, in this research, as a kind of widely used syntax information, part-of-speech (POS) tags are first reckoned. Specifically, POS tags are integrated into three versions of implementation in consideration of the fact that words with different POS tags might be utilized to express consumers' opinions. Also, the proposed TSA is one unsupervised approach and only a small number of positive and negative words are required to confine different priors for training. Finally, two big datasets regarding digital SLR and laptop are utilized to evaluate the performance of the proposed model in terms of sentiment classification and aspect extraction. Comparative experiments show that the new model can not only achieve promising results on sentiment classification but also leverage the performance on aspect extraction.

2019 ◽  
Vol 9 (4) ◽  
pp. 1-20 ◽  
Author(s):  
Nicola Burns ◽  
Yaxin Bi ◽  
Hui Wang ◽  
Terry Anderson

There is a need to automatically classify information from online reviews. Customers want to know useful information about different aspects of a product or service and also the sentiment expressed towards each aspect. This article proposes an Enhanced Twofold-LDA model (Latent Dirichlet Allocation), in which one LDA is used for aspect assignment and another is used for sentiment classification, aiming to automatically determine aspect and sentiment. The enhanced model incorporates domain knowledge (i.e., seed words) to produce more focused topics and has the ability to handle two aspects in at the sentence level simultaneously. The experiment results show that the Enhanced Twofold-LDA model is able to produce topics more related to aspects in comparison to the state of arts method ASUM (Aspect and Sentiment Unification Model), whereas comparable with ASUM on sentiment classification performance.


2021 ◽  
Author(s):  
Nicholas Buhagiar ◽  
Bahram Zahir ◽  
Abdolreza Abhari

The probabilistic topic model Latent Dirichlet Allocation (LDA) was deployed to model the themes of discourse in discussion threads on the social media aggregation website Reddit. Abstracting discussion threads as vectors of topic weights, these vectors were fed into several neural network architectures, each with a different number of hidden layers, to train machine learning models that could identify which discussion would be of interest for a given user to contribute. Using accuracy as the evaluation metric to determine which model framework achieved the best performance on a given user’s validation set, these selected models achieved an average accuracy of 66.1% on the test data for a sample set of 30 users. Using the predicted probabilities of interest made by these neural networks, recommender systems were further built and analyzed for each user.


2019 ◽  
Vol 13 (1) ◽  
Author(s):  
Zhang Jin ◽  
Weng Zhangwen ◽  
Ni Naichen

Abstract Redundant online reviews often have a negative impact on the efficiency of consumers’ decision-making in their online shopping. A feasible solution for business analytics is to select a review subset from the original review corpus for consumers, which is called review selection. This study aims to address the diversified review selection problem, and proposes an effective review selection approach called Simulated Annealing-Diversified Review Selection (SA-DRS) that considers the semantic relationship of review features and the content diversity of selected reviews simultaneously. SA-DRS first constructs a feature taxonomy by utilizing the Latent Dirichlet Allocation (LDA) topic model and the Word2vec model to measure the topic relation and word context relation. Based on the established feature taxonomy, the similarity between each pair of reviews is defined and the review quality is estimated as well. Finally, diversified, high-quality reviews are selected heuristically by SA-DRS in the spirit of the simulated annealing method, forming the selected review subset. Extensive experiments are conducted on real-world e-commerce platforms to demonstrate the effectiveness of SA-DRS compared to other extant review selection approaches.


2021 ◽  
Vol 25 (1) ◽  
pp. 205-223
Author(s):  
Jin He ◽  
Lei Li ◽  
Yan Wang ◽  
Xindong Wu

With the prevalence of online review websites, large-scale data promote the necessity of focused analysis. This task aims to capture the information that is highly relevant to a specific aspect. However, the broad scope of the aspects of the various products makes this task overarching but challenging. A commonly used solution is to modify the topic models with additional information to capture the features for a specific aspect (referred to as a targeted aspect). However, the existing topic models, either perform the full analysis to capture features as many as possible or estimate the similarity to capture features as coherent as possible, overlook the fine-grained semantic relations between the features, resulting in the captured features coarse and confusing. In this paper, we propose a novel Hierarchical Features-based Topic Model (HFTM) to extract targeted aspects from online reviews, then to capture the aspect-specific features. Specifically, our model can not only capture the direct features posing target-to-feature semantics but also capture the latent features posing feature-to-feature semantics. The experiments conducted on real-world datasets demonstrate that HFTMl outperforms the state-of-the-art baselines in terms of both aspect extraction and document classification.


2021 ◽  
Author(s):  
Nicholas Buhagiar ◽  
Bahram Zahir ◽  
Abdolreza Abhari

The probabilistic topic model Latent Dirichlet Allocation (LDA) was deployed to model the themes of discourse in discussion threads on the social media aggregation website Reddit. Abstracting discussion threads as vectors of topic weights, these vectors were fed into several neural network architectures, each with a different number of hidden layers, to train machine learning models that could identify which discussion would be of interest for a given user to contribute. Using accuracy as the evaluation metric to determine which model framework achieved the best performance on a given user’s validation set, these selected models achieved an average accuracy of 66.1% on the test data for a sample set of 30 users. Using the predicted probabilities of interest made by these neural networks, recommender systems were further built and analyzed for each user.


2020 ◽  
Vol 12 (12) ◽  
pp. 4830 ◽  
Author(s):  
Cecilia Elizabeth Bayas Aldaz ◽  
Jesus Rodriguez-Pomeda ◽  
Leyla Angélica Sandoval Hamón ◽  
Fernando Casani

This article provides a procedure to universities for understanding the social perception of their activities in the sustainability field, through the analysis of news published in the printed media. It identifies the Spanish news sources that have covered this issue the most and the topics that appear in that news coverage. Using a probabilistic topic model called Latent Dirichlet Allocation, the study includes the nine dominant topics within a corpus with more than seventeen thousand published news items (totaling approximately five and a quarter million words) from a database of almost thirteen hundred national press sources between 2014 and 2017. The study identifies the news sources that published the most news on the issue. It is also found that the amount of news on sustainability and universities declined during the covered period. The nine identified topics point towards the relevance of higher education institutions’ activities as drivers of sustainability. The social perception encapsulated within the topics signals how the public is interested in these activities. Therefore, we find some interesting relationships between sustainable development, higher education institutions’ missions and behaviors, governmental policies, university funding and governance, social and economic innovation, and green campuses in terms of the overall goal of sustainability.


SAGE Open ◽  
2021 ◽  
Vol 11 (3) ◽  
pp. 215824402110315
Author(s):  
Eunhye Park ◽  
Junehee Kwon ◽  
Bongsug (Kevin) Chae ◽  
Sung-Bum Kim

This study aims to survey user-generated content (UGC) from diners in certified green restaurants, discover the green images they recall, and demonstrate the usefulness of applying a probabilistic topic model to comprehend customers’ perceptions. Postvisit online reviews ( N = 28,098), in the form of unstructured texts from the TripAdvisor.com website, were used to find freely recalled green-restaurant images. These data were preprocessed with a structural topic model (STM) algorithm to select 51 relevant categories of images. These image categories were compared with the findings of previous studies to discover unique restaurant attributes. Furthermore, a topic-level network and a green-restaurant network were drawn to discover the most easily recallable image categories and their attributes. This machine-learning-based approach improved the reproducibility of unstructured data analyses, overcoming the subjectivity of qualitative data analysis. Theoretical and practical implications are offered for topic modeling methodology along with marketing strategies for restaurateurs.


2021 ◽  
Vol 13 (2) ◽  
pp. 763
Author(s):  
Simona Fiandrino ◽  
Alberto Tonelli

The recent Review of the Non-Financial Reporting Directive (NFRD) aims to enhance adequate non-financial information (NFI) disclosure and improve accountability for stakeholders. This study focuses on this regulatory intervention and has a twofold objective: First, it aims to understand the main underlying issues at stake; second, it suggests areas of possible amendment considering the current debates on sustainability accounting and accounting for stakeholders. In keeping with these aims, the research analyzes the documents annexed to the contribution on the Review of the NFRD by conducting a text-mining analysis with latent Dirichlet allocation (LDA) probabilistic topic model (PTM). Our findings highlight four main topics at the core of the current debate: quality of NFI, standardization, materiality, and assurance. The research suggests ways of improving managerial policies to achieve more comparable, relevant, and reliable information by bringing value creation for stakeholders into accounting. It further addresses an integrated logic of accounting for stakeholders that contributes to sustainable development.


2017 ◽  
Author(s):  
Redhouane Abdellaoui ◽  
Pierre Foulquié ◽  
Nathalie Texier ◽  
Carole Faviez ◽  
Anita Burgun ◽  
...  

BACKGROUND Medication nonadherence is a major impediment to the management of many health conditions. A better understanding of the factors underlying noncompliance to treatment may help health professionals to address it. Patients use peer-to-peer virtual communities and social media to share their experiences regarding their treatments and diseases. Using topic models makes it possible to model themes present in a collection of posts, thus to identify cases of noncompliance. OBJECTIVE The aim of this study was to detect messages describing patients’ noncompliant behaviors associated with a drug of interest. Thus, the objective was the clustering of posts featuring a homogeneous vocabulary related to nonadherent attitudes. METHODS We focused on escitalopram and aripiprazole used to treat depression and psychotic conditions, respectively. We implemented a probabilistic topic model to identify the topics that occurred in a corpus of messages mentioning these drugs, posted from 2004 to 2013 on three of the most popular French forums. Data were collected using a Web crawler designed by Kappa Santé as part of the Detec’t project to analyze social media for drug safety. Several topics were related to noncompliance to treatment. RESULTS Starting from a corpus of 3650 posts related to an antidepressant drug (escitalopram) and 2164 posts related to an antipsychotic drug (aripiprazole), the use of latent Dirichlet allocation allowed us to model several themes, including interruptions of treatment and changes in dosage. The topic model approach detected cases of noncompliance behaviors with a recall of 98.5% (272/276) and a precision of 32.6% (272/844). CONCLUSIONS Topic models enabled us to explore patients’ discussions on community websites and to identify posts related with noncompliant behaviors. After a manual review of the messages in the noncompliance topics, we found that noncompliance to treatment was present in 6.17% (276/4469) of the posts.


Sign in / Sign up

Export Citation Format

Share Document