Trends and Challenges in Lifelong Machine Learning Topic Models

In this article, I introduce the ldagibbs command, which implements latent Dirichlet allocation in Stata. Latent Dirichlet allocation is the most popular machine-learning topic model. Topic models automatically cluster text documents into a user-chosen number of topics. Latent Dirichlet allocation represents each document as a probability distribution over topics and represents each topic as a probability distribution over words. Therefore, latent Dirichlet allocation provides a way to analyze the content of large unclassified text data and an alternative to predefined document classifications.

Download Full-text

Identifying Narrative Contexts in Brazilian Popular Music Lyrics Using Sparse Topic Models: A Comparison Between Human-Based and Machine-Based Classification

10.5753/sbcm.2019.10417 ◽

2019 ◽

Author(s):

André Dalmora ◽

Tiago Tavares

Keyword(s):

Machine Learning ◽

Popular Music ◽

Life Stories ◽

Great Part ◽

Topic Models ◽

General Purpose ◽

Machine Learning Algorithms ◽

Part Of Speech ◽

Popular Songs ◽

Music Lyrics

Music lyrics can convey a great part of the meaning in popular songs. Such meaning is important for humans to understand songs as related to typical narratives, such as romantic interests or life stories. This understanding is part of affective aspects that can be used to choose songs to play in particular situations. This paper analyzes the effectiveness of using text mining tools to classify lyrics according to their narrative contexts. For such, we used a vote-based dataset and several machine learning algorithms. Also, we compared the classification results to that of a typical human. Last, we compare the problems of identifying narrative contexts and of identifying lyric valence. Our results indicate that narrative contexts can be identified more consistently than valence. Also, we show that human-based classification typically do not reach a high accuracy, which suggests an upper bound for automatic classification. narrative contexts. For such, we built a dataset containing Brazilian popular music lyrics which were raters voted online according to its context and valence. We approached the problem using a machine learning pipeline in which lyrics are projected into a vector space and then classified using general-purpose algorithms. We experimented with document representations based on sparse topic models [11, 12, 13, 14], which aims to find groups of words that typically appear together in the dataset. Also, we extracted part-of-speech tags for each lyric and used their histogram as features in the classification process.

Download Full-text

Predicting complications of diabetes mellitus through machine learning based on topic modeling: study design (Preprint)

10.2196/preprints.25550 ◽

2020 ◽

Author(s):

Benedict Han ◽

Jinwook Choi

Keyword(s):

Diabetes Mellitus ◽

Machine Learning ◽

Topic Modeling ◽

Supervised Classification ◽

Topic Models ◽

Support Vector ◽

Alcoholic Fatty Liver ◽

Complications Of Diabetes ◽

Clinical Notes ◽

Outpatient Departments

BACKGROUND Predicting the complications of diabetes mellitus from an early stage would be beneficial for its management. Topic modeling is a posterior procedure to estimate semantic objects in a dataset through a statistical approach. The topic model can play the role of a feature set for supervised classification. OBJECTIVE : We performed a study to predict diabetic retinopathy (DMR), diabetic nephropathy (DMN), and non-alcoholic fatty liver disease (NAFLD) from clinical notes using semi-supervised classification based on topic modeling. METHODS : We applied four types of machine learning algorithms for classification: random forest (RF), gradient boosting machine (GBM), support vector machine (SVM), and fully connected artificial neural network (ANN) We reviewed the topic models through statistical analysis to determine whether these topic models are clinically plausible. RESULTS F1 scores were above 0.8 when predicting all kinds of target diseases with all types of classification methods, and above 0.9 using RF or GBM. Hypertension and dyslipidemia seem to be statistically associated with DMR, DMN, and NAFLD. They may be important clues with which we can predict DMR, DMN, and NAFLD. CONCLUSIONS This study showed that complications of diabetes mellitus that are likely to occur later in life can be predicted from the clinical notes of outpatient departments. We believe that this kind of predictive model could be utilized by patients and physicians in outpatient departments as a useful tool, similar to clinical decision support systems.

Download Full-text

Combining machine-learning topic models and spatiotemporal analysis of social media data for disaster footprint and damage assessment

Cartography and Geographic Information Science ◽

10.1080/15230406.2017.1356242 ◽

2017 ◽

Vol 45 (4) ◽

pp. 362-376 ◽

Cited By ~ 43

Author(s):

Bernd Resch ◽

Florian Usländer ◽

Clemens Havas

Keyword(s):

Machine Learning ◽

Social Media ◽

Damage Assessment ◽

Topic Models ◽

Spatiotemporal Analysis ◽

Social Media Data ◽

Media Data

Download Full-text

Malware detection via API calls, topic models and machine learning

2015 IEEE International Conference on Automation Science and Engineering (CASE) ◽

10.1109/coase.2015.7294263 ◽

2015 ◽

Cited By ~ 5

Author(s):

G. Ganesh Sundarkumar ◽

Vadlamani Ravi ◽

Ifeoma Nwogu ◽

Venu Govindaraju

Keyword(s):

Machine Learning ◽

Malware Detection ◽

Topic Models

Download Full-text

A Trend Analysis of Machine Learning Research with Topic Models and Mann-Kendall Test

International Journal of Intelligent Systems and Applications ◽

10.5815/ijisa.2019.02.08 ◽

2019 ◽

Vol 11 (2) ◽

pp. 70-82 ◽

Cited By ~ 2

Author(s):

Deepak Sharma ◽

◽

Bijendra Kumar ◽

Satish Chand

Keyword(s):

Machine Learning ◽

Trend Analysis ◽

Topic Models ◽

Kendall Test ◽

Mann Kendall Test ◽

Learning Research

Download Full-text

How to Engage Followers: Classifying Fashion Brands According to Their Instagram Profiles, Posts and Comments

10.5121/csit.2020.101704 ◽

2020 ◽

Author(s):

Stefanie Scholz ◽

Christian Winkler

Keyword(s):

Machine Learning ◽

Data Analysis ◽

Topic Models ◽

Descriptive Statistics ◽

Machine Learning Techniques ◽

Maximum Effect ◽

Mass Market ◽

Automatic Data ◽

Learning Techniques ◽

Automatic Data Analysis

In this article we show how fashion brands communicate with their follower on Instagram. We use a continuously update dataset of 68 brands, more than 300,000 posts and more than 40,000,000 comments. Starting with descriptive statistics, we uncover different behavior and success of the various brands. It turns out that there are patterns specific to luxury, mass-market and sportswear brands. Posting volume is extremely brand dependent as is the number of comments and the engagement of the community. Having understood the statistics, we turn to machine learning techniques to measure the response of the community via comments. Topic models help us understand the structure of their respective community and uncover insights regarding the response to campaigns. Having up-to-date content is essential for this kind of analysis, as the market is highly volatile. Furthermore, automatic data analysis is crucial to measure the success of campaigns and adjust them accordingly for maximum effect.

Download Full-text

Mind wandering as data augmentation: How mental travel supports abstraction

Behavioral and Brain Sciences ◽

10.1017/s0140525x1900311x ◽

2020 ◽

Vol 43 ◽

Author(s):

Myrthe Faber

Keyword(s):

Machine Learning ◽

Data Augmentation ◽

Mental Content ◽

Mind Wandering ◽

Theoretical Framework ◽

Important Addition

Abstract Gilead et al. state that abstraction supports mental travel, and that mental travel critically relies on abstraction. I propose an important addition to this theoretical framework, namely that mental travel might also support abstraction. Specifically, I argue that spontaneous mental travel (mind wandering), much like data augmentation in machine learning, provides variability in mental content and context necessary for abstraction.

Download Full-text

Machine Learning for Speaker Recognition

10.1017/9781108552332 ◽

2020 ◽

Cited By ~ 2

Author(s):

Man-Wai Mak ◽

Jen-Tzung Chien

Keyword(s):

Machine Learning ◽

Speaker Recognition

Download Full-text