Renormalization Analysis of Topic Models

Sergei Koltcov; Vera Ignatenko

doi:10.3390/e22050556

Renormalization Analysis of Topic Models

Entropy ◽

10.3390/e22050556 ◽

2020 ◽

Vol 22 (5) ◽

pp. 556

Author(s):

Sergei Koltcov ◽

Vera Ignatenko

Keyword(s):

Latent Dirichlet Allocation ◽

Semantic Analysis ◽

Optimal Number ◽

Probabilistic Latent Semantic Analysis ◽

Model Parameters ◽

Grid Search ◽

Renormalization Procedure ◽

Allocation Model ◽

Latent Dirichlet Allocation Model ◽

Dirichlet Allocation

In practice, to build a machine learning model of big data, one needs to tune model parameters. The process of parameter tuning involves extremely time-consuming and computationally expensive grid search. However, the theory of statistical physics provides techniques allowing us to optimize this process. The paper shows that a function of the output of topic modeling demonstrates self-similar behavior under variation of the number of clusters. Such behavior allows using a renormalization technique. A combination of renormalization procedure with the Renyi entropy approach allows for quick searching of the optimal number of topics. In this paper, the renormalization procedure is developed for the probabilistic Latent Semantic Analysis (pLSA), and the Latent Dirichlet Allocation model with variational Expectation–Maximization algorithm (VLDA) and the Latent Dirichlet Allocation model with granulated Gibbs sampling procedure (GLDA). The experiments were conducted on two test datasets with a known number of topics in two different languages and on one unlabeled test dataset with an unknown number of topics. The paper shows that the renormalization procedure allows for finding an approximation of the optimal number of topics at least 30 times faster than the grid search without significant loss of quality.

Download Full-text

Evaluation of Text Semantic Features using Latent Dirichlet Allocation Model

International Journal of Performability Engineering ◽

10.23940/ijpe.20.06.p15.968978 ◽

2020 ◽

Vol 16 (6) ◽

pp. 968

Author(s):

Zhou Chunjie ◽

Li Nao ◽

Zhang Chi ◽

Yang Xiaoyu

Keyword(s):

Latent Dirichlet Allocation ◽

Semantic Features ◽

Allocation Model ◽

Latent Dirichlet Allocation Model ◽

Dirichlet Allocation

Download Full-text

Efficient Topic Level Opinion Mining and Sentiment Analysis Algorithm using Latent Dirichlet Allocation Model

International Journal of Advanced Trends in Computer Science and Engineering ◽

10.30534/ijatcse/2019/105852019 ◽

2019 ◽

Vol 8 (5) ◽

pp. 2568-2572

Author(s):

Vamshi Krishna B ◽

Keyword(s):

Sentiment Analysis ◽

Latent Dirichlet Allocation ◽

Opinion Mining ◽

Allocation Model ◽

Analysis Algorithm ◽

Latent Dirichlet Allocation Model ◽

Dirichlet Allocation

Download Full-text

Tourism destination image perception analysis based on the Latent Dirichlet Allocation model and dominant semantic dimensions: A case of the Old Town of Lijiang

地理科学进展 ◽

10.18306/dlkxjz.2020.04.008 ◽

2020 ◽

Vol 39 (4) ◽

pp. 614-626

Author(s):

Chenchen LIANG ◽

Renjie LI ◽

Keyword(s):

Latent Dirichlet Allocation ◽

Destination Image ◽

Tourism Destination ◽

Image Perception ◽

Allocation Model ◽

Latent Dirichlet Allocation Model ◽

Perception Analysis ◽

Dirichlet Allocation

Download Full-text

Speeding up calibration of latent Dirichlet allocation model to improve topic analysis in software engineering

10.32920/ryerson.14665455.v1 ◽

2021 ◽

Author(s):

Jorge Arturo Lopez

Keyword(s):

Software Engineering ◽

Simple Formula ◽

Latent Dirichlet Allocation ◽

Allocation Model ◽

Topic Analysis ◽

Latent Dirichlet Allocation Model ◽

Related Text ◽

The Empirical Analysis ◽

Large Corpus ◽

Dirichlet Allocation

Extraction of topics from large text corpuses helps improve Software Engineering (SE) processes. Latent Dirichlet Allocation (LDA) represents one of the algorithmic tools to understand, search, exploit, and summarize a large corpus of data (documents), and it is often used to perform such analysis. However, calibration of the models is computationally expensive, especially if iterating over a large number of topics. Our goal is to create a simple formula allowing analysts to estimate the number of topics, so that the top X topics include the desired proportion of documents under study. We derived the formula from the empirical analysis of three SE-related text corpuses. We believe that practitioners can use our formula to expedite LDA analysis. The formula is also of interest to theoreticians, as it suggests that different SE text corpuses have similar underlying properties.

Download Full-text

Decomposing biodiversity data using the Latent Dirichlet Allocation model, a probabilistic multivariate statistical method

Ecology Letters ◽

10.1111/ele.12380 ◽

2014 ◽

Vol 17 (12) ◽

pp. 1591-1601 ◽

Cited By ~ 28

Author(s):

Denis Valle ◽

Benjamin Baiser ◽

Christopher W. Woodall ◽

Robin Chazdon

Keyword(s):

Statistical Method ◽

Latent Dirichlet Allocation ◽

Biodiversity Data ◽

Multivariate Statistical ◽

Allocation Model ◽

Multivariate Statistical Method ◽

Latent Dirichlet Allocation Model ◽

Dirichlet Allocation

Download Full-text

Employing Latent Dirichlet Allocation Model for Topic Extraction of Chinese Text

International Journal of Database Theory and Application ◽

10.14257/ijdta.2016.9.7.06 ◽

2016 ◽

Vol 9 (7) ◽

pp. 51-66

Author(s):

Qihua Liu

Keyword(s):

Chinese Text ◽

Latent Dirichlet Allocation ◽

Allocation Model ◽

Topic Extraction ◽

Latent Dirichlet Allocation Model ◽

Dirichlet Allocation

Download Full-text

Restricted Domain Question-Answering Text Retrieval Method Based on Supervised Latent Dirichlet Allocation Model

Sensor Letters ◽

10.1166/sl.2014.3249 ◽

2014 ◽

Vol 12 (2) ◽

pp. 380-385 ◽

Cited By ~ 1

Author(s):

Shengxiang Gao ◽

Zhengtao Yu ◽

Junjie Zou ◽

Meng Xiao ◽

Jianyi Guo

Keyword(s):

Latent Dirichlet Allocation ◽

Question Answering ◽

Text Retrieval ◽

Restricted Domain ◽

Allocation Model ◽

Retrieval Method ◽

Latent Dirichlet Allocation Model ◽

Dirichlet Allocation

Download Full-text

Fuzzy-Gibbs latent Dirichlet allocation model for feature extraction on Indonesian documents

Contemporary Engineering Sciences ◽

10.12988/ces.2017.7325 ◽

2017 ◽

Vol 10 ◽

pp. 403-421 ◽

Cited By ~ 4

Author(s):

Putu Manik Prihatini ◽

I Ketut Gede Darma Putra ◽

Ida Ayu Dwi Giriantari ◽

Made Sudarma

Keyword(s):

Feature Extraction ◽

Latent Dirichlet Allocation ◽

Allocation Model ◽

Latent Dirichlet Allocation Model ◽

Dirichlet Allocation

Download Full-text

A cluster specific latent dirichlet allocation model for trajectory clustering in crowded videos

2014 IEEE International Conference on Image Processing (ICIP) ◽

10.1109/icip.2014.7025476 ◽

2014 ◽

Cited By ~ 1

Author(s):

Jialing Zou ◽

Yanting Cui ◽

Fang Wan ◽

Qixiang Ye ◽

Jianbin Jiao

Keyword(s):

Latent Dirichlet Allocation ◽

Trajectory Clustering ◽

Allocation Model ◽

Latent Dirichlet Allocation Model ◽

Dirichlet Allocation

Download Full-text

Fusion Hidden Markov Model with Latent Dirichlet Allocation Model in Heterogeneous Domains

Proceedings of International Conference on Internet Multimedia Computing and Service - ICIMCS '14 ◽

10.1145/2632856.2632889 ◽

2014 ◽

Cited By ~ 1

Author(s):

Lei Wu ◽

Wensheng Zhang ◽

Jue Wang

Keyword(s):

Markov Model ◽

Hidden Markov Model ◽

Latent Dirichlet Allocation ◽

Hidden Markov ◽

Allocation Model ◽

Latent Dirichlet Allocation Model ◽

Dirichlet Allocation

Download Full-text