Detection of Cases of Noncompliance to Drug Treatment in Patient Forum Posts: Topic Model Approach (Preprint)

Mapping Intimacies ◽

10.2196/preprints.9222 ◽

2017 ◽

Author(s):

Redhouane Abdellaoui ◽

Pierre Foulquié ◽

Nathalie Texier ◽

Carole Faviez ◽

Anita Burgun ◽

...

Keyword(s):

Social Media ◽

Virtual Communities ◽

Latent Dirichlet Allocation ◽

Topic Model ◽

Antidepressant Drug ◽

Topic Models ◽

Web Crawler ◽

Probabilistic Topic Model ◽

Manual Review ◽

Model Approach

BACKGROUND Medication nonadherence is a major impediment to the management of many health conditions. A better understanding of the factors underlying noncompliance to treatment may help health professionals to address it. Patients use peer-to-peer virtual communities and social media to share their experiences regarding their treatments and diseases. Using topic models makes it possible to model themes present in a collection of posts, thus to identify cases of noncompliance. OBJECTIVE The aim of this study was to detect messages describing patients’ noncompliant behaviors associated with a drug of interest. Thus, the objective was the clustering of posts featuring a homogeneous vocabulary related to nonadherent attitudes. METHODS We focused on escitalopram and aripiprazole used to treat depression and psychotic conditions, respectively. We implemented a probabilistic topic model to identify the topics that occurred in a corpus of messages mentioning these drugs, posted from 2004 to 2013 on three of the most popular French forums. Data were collected using a Web crawler designed by Kappa Santé as part of the Detec’t project to analyze social media for drug safety. Several topics were related to noncompliance to treatment. RESULTS Starting from a corpus of 3650 posts related to an antidepressant drug (escitalopram) and 2164 posts related to an antipsychotic drug (aripiprazole), the use of latent Dirichlet allocation allowed us to model several themes, including interruptions of treatment and changes in dosage. The topic model approach detected cases of noncompliance behaviors with a recall of 98.5% (272/276) and a precision of 32.6% (272/844). CONCLUSIONS Topic models enabled us to explore patients’ discussions on community websites and to identify posts related with noncompliant behaviors. After a manual review of the messages in the noncompliance topics, we found that noncompliance to treatment was present in 6.17% (276/4469) of the posts.

Download Full-text

Concerns Expressed by Chinese Social Media Users During the COVID-19 Pandemic: Content Analysis of Sina Weibo Microblogging Data (Preprint)

10.2196/preprints.22152 ◽

2020 ◽

Author(s):

Junze Wang ◽

Ying Zhou ◽

Wei Zhang ◽

Richard Evans ◽

Chengyan Zhu

Keyword(s):

Social Media ◽

Latent Dirichlet Allocation ◽

Vaccine Development ◽

User Behavior ◽

Chinese Government ◽

Web Crawler ◽

Health Crisis ◽

User Behavior Analysis ◽

Sina Weibo ◽

Social Media Platforms

BACKGROUND The COVID-19 pandemic has created a global health crisis that is affecting economies and societies worldwide. During times of uncertainty and unexpected change, people have turned to social media platforms as communication tools and primary information sources. Platforms such as Twitter and Sina Weibo have allowed communities to share discussion and emotional support; they also play important roles for individuals, governments, and organizations in exchanging information and expressing opinions. However, research that studies the main concerns expressed by social media users during the pandemic is limited. OBJECTIVE The aim of this study was to examine the main concerns raised and discussed by citizens on Sina Weibo, the largest social media platform in China, during the COVID-19 pandemic. METHODS We used a web crawler tool and a set of predefined search terms (New Coronavirus Pneumonia, New Coronavirus, and COVID-19) to investigate concerns raised by Sina Weibo users. Textual information and metadata (number of likes, comments, retweets, publishing time, and publishing location) of microblog posts published between December 1, 2019, and July 32, 2020, were collected. After segmenting the words of the collected text, we used a topic modeling technique, latent Dirichlet allocation (LDA), to identify the most common topics posted by users. We analyzed the emotional tendencies of the topics, calculated the proportional distribution of the topics, performed user behavior analysis on the topics using data collected from the number of likes, comments, and retweets, and studied the changes in user concerns and differences in participation between citizens living in different regions of mainland China. RESULTS Based on the 203,191 eligible microblog posts collected, we identified 17 topics and grouped them into 8 themes. These topics were pandemic statistics, domestic epidemic, epidemics in other countries worldwide, COVID-19 treatments, medical resources, economic shock, quarantine and investigation, patients’ outcry for help, work and production resumption, psychological influence, joint prevention and control, material donation, epidemics in neighboring countries, vaccine development, fueling and saluting antiepidemic action, detection, and study resumption. The mean sentiment was positive for 11 topics and negative for 6 topics. The topic with the highest mean of retweets was domestic epidemic, while the topic with the highest mean of likes was quarantine and investigation. CONCLUSIONS Concerns expressed by social media users are highly correlated with the evolution of the global pandemic. During the COVID-19 pandemic, social media has provided a platform for Chinese government departments and organizations to better understand public concerns and demands. Similarly, social media has provided channels to disseminate information about epidemic prevention and has influenced public attitudes and behaviors. Government departments, especially those related to health, can create appropriate policies in a timely manner through monitoring social media platforms to guide public opinion and behavior during epidemics.

Download Full-text

News Organizations’ Selective Link Sharing as Gatekeeping : A Structural Topic Model Approach

Computational Communication Research ◽

10.5117/ccr2019.1.003.pak ◽

2019 ◽

Vol 1 (1) ◽

pp. 45-78

Author(s):

Chankyung Pak

Keyword(s):

Social Media ◽

Topic Model ◽

News Stories ◽

News Websites ◽

News Organizations ◽

Computational Data ◽

Topic Distribution ◽

Editorial Decisions ◽

Structural Topic Model ◽

Model Approach

Abstract To disseminate their stories efficiently via social media, news organizations make decisions that resemble traditional editorial decisions. However, the decisions for social media may deviate from traditional ones because they are often made outside the newsroom and guided by audience metrics. This study focuses on selective link sharing as quasi-gatekeeping on Twitter ‐ conditioning a link sharing decision about news content. It illustrates how selective link sharing resembles and deviates from gatekeeping for the publication of news stories. Using a computational data collection method and a machine learning technique called Structural Topic Model (STM), this study shows that selective link sharing generates a different topic distribution between news websites and Twitter and thus significantly revokes the specialty of news organizations. This finding implies that emergent logic, which governs news organizations’ decisions for social media, can undermine the provision of diverse news.

Download Full-text

Identifying target audience on enterprise social network

Industrial Management & Data Systems ◽

10.1108/imds-01-2018-0007 ◽

2019 ◽

Vol 119 (1) ◽

pp. 111-128 ◽

Cited By ~ 3

Author(s):

Jianhong Luo ◽

Xuwei Pan ◽

Shixiong Wang ◽

Yujing Huang

Keyword(s):

Social Media ◽

Social Network ◽

Latent Dirichlet Allocation ◽

Topic Model ◽

Information Service ◽

Target Audience ◽

Business Practices ◽

Content Type ◽

Enterprise Social Media ◽

Enterprise Social Network

Purpose Delivering messages and information to potentially interested users is one of the distinguishing applications of online enterprise social network (ESN). The purpose of this paper is to provide insights to better understand the repost preferences of users and provide personalized information service in enterprise social media marketing. Design/methodology/approach It is accomplished by constructing a target audience identification framework. Repost preference latent Dirichlet allocation (RPLDA) topic model topic model is proposed to understand the mass user online repost preferences toward different contents. A topic-oriented preference metric is proposed to measure the preference degree of individual users. And the function of reposting forecasting is formulated to identify target audience. Findings The empirical research shows the following: a total of 20 percent of the repost users in ESN represent the key active users who are particularly interested in the latent topic of messages in ESN and fits Pareto distribution; and the target audience identification framework can successfully identify different target key users for messages with different latent topics. Practical implications The findings should motivate marketing managers to improve enterprise brand by identifying key target audience in ESN and marketing in a way that truthfully reflects personalized preferences. Originality/value This study runs counter to most current business practices, which tend to use simple popularity to seek important users. Adaptively and dynamically identifying target audience appears to have considerable potential, especially in the rapidly growing area of enterprise social media information service.

Download Full-text

Analyzing Sentiments and Diffusion Characteristics of COVID-19 Vaccine Misinformation Topics in Social Media

International Journal of Business Analytics ◽

10.4018/ijban.292056 ◽

2022 ◽

Vol 9 (3) ◽

pp. 1-22

Author(s):

Mohammad Daradkeh

Keyword(s):

Social Media ◽

Data Analytics ◽

Latent Dirichlet Allocation ◽

Topic Model ◽

Diffusion Characteristics ◽

Sentiment Orientation ◽

Positive Sentiment ◽

Negative Sentiment ◽

And Diffusion ◽

Machine Learning Models

This study presents a data analytics framework that aims to analyze topics and sentiments associated with COVID-19 vaccine misinformation in social media. A total of 40,359 tweets related to COVID-19 vaccination were collected between January 2021 and March 2021. Misinformation was detected using multiple predictive machine learning models. Latent Dirichlet Allocation (LDA) topic model was used to identify dominant topics in COVID-19 vaccine misinformation. Sentiment orientation of misinformation was analyzed using a lexicon-based approach. An independent-samples t-test was performed to compare the number of replies, retweets, and likes of misinformation with different sentiment orientations. Based on the data sample, the results show that COVID-19 vaccine misinformation included 21 major topics. Across all misinformation topics, the average number of replies, retweets, and likes of tweets with negative sentiment was 2.26, 2.68, and 3.29 times higher, respectively, than those with positive sentiment.

Download Full-text

Identifying User Interests In An Online Discussion Forum With Deep Learning

10.32920/ryerson.14654349.v1 ◽

2021 ◽

Author(s):

Nicholas Buhagiar ◽

Bahram Zahir ◽

Abdolreza Abhari

Keyword(s):

Latent Dirichlet Allocation ◽

Topic Model ◽

Model Framework ◽

User Interests ◽

Online Discussion Forum ◽

Probabilistic Topic Model ◽

Average Accuracy ◽

Discussion Threads ◽

Validation Set ◽

Evaluation Metric

The probabilistic topic model Latent Dirichlet Allocation (LDA) was deployed to model the themes of discourse in discussion threads on the social media aggregation website Reddit. Abstracting discussion threads as vectors of topic weights, these vectors were fed into several neural network architectures, each with a different number of hidden layers, to train machine learning models that could identify which discussion would be of interest for a given user to contribute. Using accuracy as the evaluation metric to determine which model framework achieved the best performance on a given user’s validation set, these selected models achieved an average accuracy of 66.1% on the test data for a sample set of 30 users. Using the predicted probabilities of interest made by these neural networks, recommender systems were further built and analyzed for each user.

Download Full-text

Method of Moments for Topic Models with Mixed Discrete and Continuous Features

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2021/333 ◽

2021 ◽

Author(s):

Joachim Giesen ◽

Paul Kahlmeyer ◽

Sören Laue ◽

Matthias Mitterreiter ◽

Frank Nussbaum ◽

...

Keyword(s):

Method Of Moments ◽

Latent Variable ◽

Latent Dirichlet Allocation ◽

Latent Class ◽

Topic Model ◽

Likelihood Function ◽

Natural Extension ◽

Topic Models ◽

Discrete Variables ◽

Continuous State

Topic models are characterized by a latent class variable that represents the different topics. Traditionally, their observable variables are modeled as discrete variables like, for instance, in the prototypical latent Dirichlet allocation (LDA) topic model. In LDA, words in text documents are encoded by discrete count vectors with respect to some dictionary. The classical approach for learning topic models optimizes a likelihood function that is non-concave due to the presence of the latent variable. Hence, this approach mostly boils down to using search heuristics like the EM algorithm for parameter estimation. Recently, it was shown that topic models can be learned with strong algorithmic and statistical guarantees through Pearson's method of moments. Here, we extend this line of work to topic models that feature discrete as well as continuous observable variables (features). Moving beyond discrete variables as in LDA allows for more sophisticated features and a natural extension of topic models to other modalities than text, like, for instance, images. We provide algorithmic and statistical guarantees for the method of moments applied to the extended topic model that we corroborate experimentally on synthetic data. We also demonstrate the applicability of our model on real-world document data with embedded images that we preprocess into continuous state-of-the-art feature vectors.

Download Full-text

Can topic models be used in research evaluations? Reproducibility, validity, and reliability when compared with semantic maps

Research Evaluation ◽

10.1093/reseval/rvz015 ◽

2019 ◽

Vol 28 (3) ◽

pp. 263-272 ◽

Cited By ~ 1

Author(s):

Tobias Hecking ◽

Loet Leydesdorff

Keyword(s):

Latent Dirichlet Allocation ◽

Topic Model ◽

Research Evaluation ◽

Topic Models ◽

Principal Component ◽

Evaluation Framework ◽

Validity And Reliability ◽

Text Corpora ◽

Semantic Maps ◽

Semantic Coherence

AbstractWe replicate and analyze the topic model which was commissioned to King’s College and Digital Science for the Research Evaluation Framework (REF 2014) in the United Kingdom: 6,638 case descriptions of societal impact were submitted by 154 higher-education institutes. We compare the Latent Dirichlet Allocation (LDA) model with Principal Component Analysis (PCA) of document-term matrices using the same data. Since topic models are almost by definition applied to text corpora which are too large to read, validation of the results of these models is hardly possible; furthermore the models are irreproducible for a number of reasons. However, removing a small fraction of the documents from the sample—a test for reliability—has on average a larger impact in terms of decay on LDA than on PCA-based models. The semantic coherence of LDA models outperforms PCA-based models. In our opinion, results of the topic models are statistical and should not be used for grant selections and micro decision-making about research without follow-up using domain-specific semantic maps.

Download Full-text

Topic Modeling in Embedding Spaces

Transactions of the Association for Computational Linguistics ◽

10.1162/tacl_a_00325 ◽

2020 ◽

Vol 8 ◽

pp. 439-453 ◽

Cited By ~ 2

Author(s):

Adji B. Dieng ◽

Francisco J. R. Ruiz ◽

David M. Blei

Keyword(s):

Topic Modeling ◽

Latent Dirichlet Allocation ◽

Topic Model ◽

Topic Models ◽

Predictive Performance ◽

Inner Product ◽

Natural Parameter ◽

Document Models ◽

Heavy Tailed ◽

Categorical Distribution

Topic modeling analyzes documents to learn meaningful patterns of words. However, existing topic models fail to learn interpretable topics when working with large and heavy-tailed vocabularies. To this end, we develop the embedded topic model (etm), a generative model of documents that marries traditional topic models with word embeddings. More specifically, the etm models each word with a categorical distribution whose natural parameter is the inner product between the word’s embedding and an embedding of its assigned topic. To fit the etm, we develop an efficient amortized variational inference algorithm. The etm discovers interpretable topics even with large vocabularies that include rare words and stop words. It outperforms existing document models, such as latent Dirichlet allocation, in terms of both topic quality and predictive performance.

Download Full-text

Ldagibbs: A Command for Topic Modeling in Stata Using Latent Dirichlet Allocation

The Stata Journal Promoting communications on statistics and Stata ◽

10.1177/1536867x1801800107 ◽

2018 ◽

Vol 18 (1) ◽

pp. 101-117 ◽

Cited By ~ 10

Author(s):

Carlo Schwarz

Keyword(s):

Machine Learning ◽

Probability Distribution ◽

Topic Modeling ◽

Latent Dirichlet Allocation ◽

Topic Model ◽

Topic Models ◽

Text Documents ◽

Text Data ◽

Dirichlet Allocation

In this article, I introduce the ldagibbs command, which implements latent Dirichlet allocation in Stata. Latent Dirichlet allocation is the most popular machine-learning topic model. Topic models automatically cluster text documents into a user-chosen number of topics. Latent Dirichlet allocation represents each document as a probability distribution over topics and represents each topic as a probability distribution over words. Therefore, latent Dirichlet allocation provides a way to analyze the content of large unclassified text data and an alternative to predefined document classifications.

Download Full-text

Probabilistic topic model based approach for detecting bursty events from social media data

2017 International Conference on Security, Pattern Analysis, and Cybernetics (SPAC) ◽

10.1109/spac.2017.8304365 ◽

2017 ◽

Cited By ~ 1

Author(s):

Chunshan Li ◽

Dianhui Chu

Keyword(s):

Social Media ◽

Topic Model ◽

Social Media Data ◽

Model Based ◽

Probabilistic Topic Model ◽

Media Data

Download Full-text