MR-LDA

Xiongwen Pang; Benshuai Wan; Huifang Li; Weiwei Lin

doi:10.4018/ijghpc.2016100106

MR-LDA

International Journal of Grid and High Performance Computing ◽

10.4018/ijghpc.2016100106 ◽

2016 ◽

Vol 8 (4) ◽

pp. 100-113 ◽

Cited By ~ 3

Author(s):

Xiongwen Pang ◽

Benshuai Wan ◽

Huifang Li ◽

Weiwei Lin

Keyword(s):

Text Mining ◽

Gibbs Sampling ◽

Efficient Method ◽

Sampling Method ◽

Latent Dirichlet Allocation ◽

Experimental Results ◽

Generation Process ◽

Effective Solution ◽

Topic Mining ◽

Dirichlet Allocation

Latent Dirichlet Allocation(LDA) is an efficient method of text mining,but applying LDA directly to Chinese micro-blog texts will not work well because micro-blogs are more social, brief, and closely related with each other. Based on LDA, this paper proposes a Micro-blog Relation LDA model (MR-LDA), which takes the relations between Chinese micro-blog documents and other Chinese micro-blog documents into consideration to help topic mining in micro-blog. The authors extend LDA in the following two points. First, they aggregate several Chinese micro-blogs as a single micro-blog document to solve the problem of short texts. Second, they model the generation process of Chinese micro-blogs more accurately by taking relationship between micro-blog documents into consideration. MR-LDA is more suitable to model Chinese micro-blog data. Gibbs sampling method is borrowed to inference the model. Experimental results on actual datasets show that MR-LDA model can offer an effective solution to text mining for Chinese micro-blog.

Download Full-text

GLDA: Parallel Gibbs Sampling for Latent Dirichlet Allocation on GPU

Communications in Computer and Information Science - Advanced Computer Architecture ◽

10.1007/978-981-10-2209-8_9 ◽

2016 ◽

pp. 97-107 ◽

Cited By ~ 2

Author(s):

Pei Xue ◽

Tao Li ◽

Kezhao Zhao ◽

Qiankun Dong ◽

Wenjing Ma

Keyword(s):

Gibbs Sampling ◽

Latent Dirichlet Allocation ◽

Dirichlet Allocation

Download Full-text

Mecanismos de alinhamento de preferências em governos multipartidários: controle de políticas públicas no presidencialismo brasileiro

Opinião Pública ◽

10.1590/1807-01912017232429 ◽

2017 ◽

Vol 23 (2) ◽

pp. 429-458

Author(s):

Victor Araújo

Keyword(s):

Gibbs Sampling ◽

Latent Dirichlet Allocation ◽

Dirichlet Allocation

Resumo A formação de governos multipartidários potencializa o risco de assimetria de informação entre principals e agentes, de maneira que os conflitos do gabinete sobre políticas se refletem no comportamento dos partidos no parlamento. Diversos estudos demonstram que o controle mútuo entre os partidos integrantes do gabinete é uma forma de compensar a perda de informação inerente à delegação. Enquanto a literatura costuma focar na fase de formulação das políticas, analisando os governos formados no Brasil entre 1995 e 2014, argumento que existe um conjunto mais diversificado de estratégias que permitem aos partidos escrutinar as políticas implementadas por seus parceiros de gabinete. Fazendo uso de análise de redes e técnicas quantitativas de análise de texto (método Gibbs Sampling, algoritmo bayesiano derivado do Latent Dirichlet allocation – LDA) mostro que, nas situações em que os portfólios ministeriais são distribuídos para atores com distintas preferências sobre políticas, os partidos intensificam o uso dos Requerimentos de Informação (RIC) para monitorar os ministérios e políticas que lhes interessam. A estrutura das redes de controle intragabinete varia em função da saliência dos ministérios: os partidos responsáveis pelos portfólios com maior dotação orçamentária são os atores com maior grau de centralidade nas redes de monitoramento mútuo.

Download Full-text

Latent Dirichlet Allocation based on Gibbs Sampling for gene function prediction

2014 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology ◽

10.1109/cibcb.2014.6845514 ◽

2014 ◽

Cited By ~ 17

Author(s):

Pietro Pinoli ◽

Davide Chicco ◽

Marco Masseroli

Keyword(s):

Gibbs Sampling ◽

Gene Function ◽

Latent Dirichlet Allocation ◽

Function Prediction ◽

Gene Function Prediction ◽

Dirichlet Allocation

Download Full-text

Show and Tell More: Topic-Oriented Multi-Sentence Image Captioning

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/592 ◽

2018 ◽

Cited By ~ 8

Author(s):

Yuzhao Mao ◽

Chang Zhou ◽

Xiaojie Wang ◽

Ruifan Li

Keyword(s):

Latent Dirichlet Allocation ◽

Experimental Results ◽

Image Captioning ◽

Single Sentence ◽

Gate Unit ◽

Dirichlet Allocation

Image captioning aims to generate textual descriptions for images. Most previous work generates a single-sentence description for each image. However, a picture is worth a thousand words. Single-sentence can hardly give a complete view of an image even by humans. In this paper, we propose a novel Topic-Oriented Multi-Sentence (\emph{TOMS}) captioning model, which can generate multiple topic-oriented sentences to describe an image. Different from object instances or attributes, topics mined by the latent Dirichlet allocation reflect hidden thematic structures in reference sentences of an image. In our model, each topic is integrated to a caption generator with a Fusion Gate Unit (FGU) to guide the generation of a sentence towards a certain topic perspective. With multiple sentences from different topics, our \emph{TOMS} provides a complete description of an image. Experimental results on both sentence and paragraph datasets demonstrate the effectiveness of our \emph{TOMS} in terms of topical consistency and descriptive completeness.

Download Full-text

Applying Text Mining, Clustering Analysis, and Latent Dirichlet Allocation Techniques for Topic Classification of Environmental Education Journals

Sustainability ◽

10.3390/su131910856 ◽

2021 ◽

Vol 13 (19) ◽

pp. 10856

Author(s):

I-Cheng Chang ◽

Tai-Kuei Yu ◽

Yu-Jie Chang ◽

Tai-Yi Yu

Keyword(s):

Artificial Intelligence ◽

Cluster Analysis ◽

Text Mining ◽

Environmental Education ◽

Hierarchical Clustering ◽

Language Processing ◽

Latent Dirichlet Allocation ◽

Word Analysis ◽

Dirichlet Allocation

Facing the big data wave, this study applied artificial intelligence to cite knowledge and find a feasible process to play a crucial role in supplying innovative value in environmental education. Intelligence agents of artificial intelligence and natural language processing (NLP) are two key areas leading the trend in artificial intelligence; this research adopted NLP to analyze the research topics of environmental education research journals in the Web of Science (WoS) database during 2011–2020 and interpret the categories and characteristics of abstracts for environmental education papers. The corpus data were selected from abstracts and keywords of research journal papers, which were analyzed with text mining, cluster analysis, latent Dirichlet allocation (LDA), and co-word analysis methods. The decisions regarding the classification of feature words were determined and reviewed by domain experts, and the associated TF-IDF weights were calculated for the following cluster analysis, which involved a combination of hierarchical clustering and K-means analysis. The hierarchical clustering and LDA decided the number of required categories as seven, and the K-means cluster analysis classified the overall documents into seven categories. This study utilized co-word analysis to check the suitability of the K-means classification, analyzed the terms with high TF-IDF wights for distinct K-means groups, and examined the terms for different topics with the LDA technique. A comparison of the results demonstrated that most categories that were recognized with K-means and LDA methods were the same and shared similar words; however, two categories had slight differences. The involvement of field experts assisted with the consistency and correctness of the classified topics and documents.

Download Full-text

Analisis topik konten channel YouTube K-pop Indonesia menggunakan Latent Dirichlet Allocation

Teknologi ◽

10.26594/teknologi.v11i1.2155 ◽

2021 ◽

Vol 11 (1) ◽

pp. 16-25

Author(s):

Alfrida Rahmawati ◽

◽

Najla Lailin Nikmah ◽

Reynaldi Drajat Ageng Perwira ◽

Nur Aini Rakhmawati ◽

...

Keyword(s):

Text Mining ◽

New Media ◽

Digital Technology ◽

Latent Dirichlet Allocation ◽

Optimal Number ◽

Allocation Method ◽

Internet Users ◽

The World ◽

Dirichlet Allocation

The development of digital technology has brought new media, one of which is Youtube, which is now one of the most widely used applications for internet users in the world. The growth of the audience which is known as viewers, is also suported by the contribution from the content creators or also known as YouTubers from Indonesian. The more the viewers grow, the more their demand for trend content are also grwoing at surprisingly speed in one of the topics which is H-pop. In this study, the author wanted to see the dominant topics that K-pop YouTubers often upload to support content creator. This research was conducted using the Latent Dirichlet Allocation method. The analysis was carried out on after using text mining on 2563 videos from 10 K-pop YouTuber accounts with more than 100,000 subscribers. To determine the optimal number of topics by looking at the value of perplexity and topic conherence. The results obtained are the top 5 topics that are the content material in the uploaded video. These topics include reactions to dance covers, unboxing on albums and conducting reviews, riddles from K-pop dances and vlogs together to discuss about covers and reactions to sounds on K-pop songs.

Download Full-text

A more time-efficient gibbs sampling algorithm based on SparseLDA for latent dirichlet allocation

Intelligent Data Analysis ◽

10.3233/ida-173609 ◽

2018 ◽

Vol 22 (6) ◽

pp. 1227-1257

Author(s):

Xiaotang Zhou ◽

Jihong Ouyang ◽

Ximing Li

Keyword(s):

Gibbs Sampling ◽

Latent Dirichlet Allocation ◽

Sampling Algorithm ◽

Gibbs Sampling Algorithm ◽

Dirichlet Allocation

Download Full-text

An overview of literature on COVID-19, MERS and SARS: Using text mining and latent Dirichlet allocation

Journal of Information Science ◽

10.1177/0165551520954674 ◽

2020 ◽

pp. 016555152095467

Author(s):

Xian Cheng ◽

Qiang Cao ◽

Stephen Shaoyi Liao

Keyword(s):

Text Mining ◽

Latent Dirichlet Allocation ◽

Information Science ◽

Medical Community ◽

Main Research ◽

Mining Technique ◽

Research Theme ◽

Research Themes ◽

Similarity And Difference ◽

Dirichlet Allocation

The unprecedented outbreak of COVID-19 is one of the most serious global threats to public health in this century. During this crisis, specialists in information science could play key roles to support the efforts of scientists in the health and medical community for combatting COVID-19. In this article, we demonstrate that information specialists can support health and medical community by applying text mining technique with latent Dirichlet allocation procedure to perform an overview of a mass of coronavirus literature. This overview presents the generic research themes of the coronavirus diseases: COVID-19, MERS and SARS, reveals the representative literature per main research theme and displays a network visualisation to explore the overlapping, similarity and difference among these themes. The overview can help the health and medical communities to extract useful information and interrelationships from coronavirus-related studies.

Download Full-text

An Indirected Recommendation Model for Chinese Microblog

Cybernetics and Information Technologies ◽

10.1515/cait-2016-0027 ◽

2016 ◽

Vol 16 (2) ◽

pp. 148-159

Author(s):

Jianyong Duan ◽

Zheng Dong ◽

Mei Zhang

Keyword(s):

Comparative Analysis ◽

Information Sharing ◽

Latent Dirichlet Allocation ◽

Topic Model ◽

Experimental Results ◽

User Interests ◽

Recommendation Algorithms ◽

Dirichlet Allocation

Abstract Microblog is a browser-based platform for web user’s information sharing and communication. With the rapidly increasing of microblog population, its recommendation function becomes necessary. This paper proposes the recommendation by the Latent Dirichlet Allocation topic model, which combines the user interests into the model to meet their needs. We also conduct a comparative analysis between indirect and direct recommendation algorithms. The experimental results show that the indirect recommendation is more effective for the micro-blog recommendation.

Download Full-text

A WORD POSITION-RELATED LDA MODEL

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001411008890 ◽

2011 ◽

Vol 25 (06) ◽

pp. 909-925 ◽

Cited By ~ 3

Author(s):

LIDONG ZHAI ◽

ZHAOYUN DING ◽

YAN JIA ◽

BIN ZHOU

Keyword(s):

Experimental Data ◽

Probabilistic Model ◽

Latent Dirichlet Allocation ◽

Semantic Information ◽

Experimental Results ◽

New Method ◽

Word Position ◽

Average Improvement ◽

Latent Topics ◽

Dirichlet Allocation

LDA (Latent Dirichlet Allocation) proposed by Blei is a generative probabilistic model of a corpus, where documents are represented as random mixtures over latent topics, and each topic is characterized by a distribution over words, but not the attributes of word positions of every document in the corpus. In this paper, a Word Position-Related LDA Model is proposed taking into account the attributes of word positions of every document in the corpus, where each word is characterized by a distribution over word positions. At the same time, the precision of the topic-word's interpretability is improved by integrating the distribution of the word-position and the appropriate word degree, taking into account the different word degree in the different word positions. Finally, a new method, a size-aware word intrusion method is proposed to improve the ability of the topic-word's interpretability. Experimental results on the NIPS corpus show that the Word Position-Related LDA Model can improve the precision of the topic-word's interpretability. And the average improvement of the precision in the topic-word's interpretability is about 9.67%. Also, the size-aware word intrusion method can interpret the topic-word's semantic information more comprehensively and more effectively through comparing the different experimental data.

Download Full-text