Trends and Challenges in Lifelong Machine Learning Topic Models

Author(s):  
Muhammad Taimoor Khan ◽  
Shehzad Khalid
Author(s):  
Carlo Schwarz

In this article, I introduce the ldagibbs command, which implements latent Dirichlet allocation in Stata. Latent Dirichlet allocation is the most popular machine-learning topic model. Topic models automatically cluster text documents into a user-chosen number of topics. Latent Dirichlet allocation represents each document as a probability distribution over topics and represents each topic as a probability distribution over words. Therefore, latent Dirichlet allocation provides a way to analyze the content of large unclassified text data and an alternative to predefined document classifications.


2019 ◽  
Author(s):  
André Dalmora ◽  
Tiago Tavares

Music lyrics can convey a great part of the meaning in popular songs. Such meaning is important for humans to understand songs as related to typical narratives, such as romantic interests or life stories. This understanding is part of affective aspects that can be used to choose songs to play in particular situations. This paper analyzes the effectiveness of using text mining tools to classify lyrics according to their narrative contexts. For such, we used a vote-based dataset and several machine learning algorithms. Also, we compared the classification results to that of a typical human. Last, we compare the problems of identifying narrative contexts and of identifying lyric valence. Our results indicate that narrative contexts can be identified more consistently than valence. Also, we show that human-based classification typically do not reach a high accuracy, which suggests an upper bound for automatic classification. narrative contexts. For such, we built a dataset containing Brazilian popular music lyrics which were raters voted online according to its context and valence. We approached the problem using a machine learning pipeline in which lyrics are projected into a vector space and then classified using general-purpose algorithms. We experimented with document representations based on sparse topic models [11, 12, 13, 14], which aims to find groups of words that typically appear together in the dataset. Also, we extracted part-of-speech tags for each lyric and used their histogram as features in the classification process.


2020 ◽  
Author(s):  
Benedict Han ◽  
Jinwook Choi

BACKGROUND Predicting the complications of diabetes mellitus from an early stage would be beneficial for its management. Topic modeling is a posterior procedure to estimate semantic objects in a dataset through a statistical approach. The topic model can play the role of a feature set for supervised classification. OBJECTIVE : We performed a study to predict diabetic retinopathy (DMR), diabetic nephropathy (DMN), and non-alcoholic fatty liver disease (NAFLD) from clinical notes using semi-supervised classification based on topic modeling. METHODS : We applied four types of machine learning algorithms for classification: random forest (RF), gradient boosting machine (GBM), support vector machine (SVM), and fully connected artificial neural network (ANN) We reviewed the topic models through statistical analysis to determine whether these topic models are clinically plausible. RESULTS F1 scores were above 0.8 when predicting all kinds of target diseases with all types of classification methods, and above 0.9 using RF or GBM. Hypertension and dyslipidemia seem to be statistically associated with DMR, DMN, and NAFLD. They may be important clues with which we can predict DMR, DMN, and NAFLD. CONCLUSIONS This study showed that complications of diabetes mellitus that are likely to occur later in life can be predicted from the clinical notes of outpatient departments. We believe that this kind of predictive model could be utilized by patients and physicians in outpatient departments as a useful tool, similar to clinical decision support systems.


2020 ◽  
Author(s):  
Stefanie Scholz ◽  
Christian Winkler

In this article we show how fashion brands communicate with their follower on Instagram. We use a continuously update dataset of 68 brands, more than 300,000 posts and more than 40,000,000 comments. Starting with descriptive statistics, we uncover different behavior and success of the various brands. It turns out that there are patterns specific to luxury, mass-market and sportswear brands. Posting volume is extremely brand dependent as is the number of comments and the engagement of the community. Having understood the statistics, we turn to machine learning techniques to measure the response of the community via comments. Topic models help us understand the structure of their respective community and uncover insights regarding the response to campaigns. Having up-to-date content is essential for this kind of analysis, as the market is highly volatile. Furthermore, automatic data analysis is crucial to measure the success of campaigns and adjust them accordingly for maximum effect.


2020 ◽  
Vol 43 ◽  
Author(s):  
Myrthe Faber

Abstract Gilead et al. state that abstraction supports mental travel, and that mental travel critically relies on abstraction. I propose an important addition to this theoretical framework, namely that mental travel might also support abstraction. Specifically, I argue that spontaneous mental travel (mind wandering), much like data augmentation in machine learning, provides variability in mental content and context necessary for abstraction.


Sign in / Sign up

Export Citation Format

Share Document