MR-LDA

2016 ◽  
Vol 8 (4) ◽  
pp. 100-113 ◽  
Author(s):  
Xiongwen Pang ◽  
Benshuai Wan ◽  
Huifang Li ◽  
Weiwei Lin

Latent Dirichlet Allocation(LDA) is an efficient method of text mining,but applying LDA directly to Chinese micro-blog texts will not work well because micro-blogs are more social, brief, and closely related with each other. Based on LDA, this paper proposes a Micro-blog Relation LDA model (MR-LDA), which takes the relations between Chinese micro-blog documents and other Chinese micro-blog documents into consideration to help topic mining in micro-blog. The authors extend LDA in the following two points. First, they aggregate several Chinese micro-blogs as a single micro-blog document to solve the problem of short texts. Second, they model the generation process of Chinese micro-blogs more accurately by taking relationship between micro-blog documents into consideration. MR-LDA is more suitable to model Chinese micro-blog data. Gibbs sampling method is borrowed to inference the model. Experimental results on actual datasets show that MR-LDA model can offer an effective solution to text mining for Chinese micro-blog.

2017 ◽  
Vol 23 (2) ◽  
pp. 429-458
Author(s):  
Victor Araújo

Resumo A formação de governos multipartidários potencializa o risco de assimetria de informação entre principals e agentes, de maneira que os conflitos do gabinete sobre políticas se refletem no comportamento dos partidos no parlamento. Diversos estudos demonstram que o controle mútuo entre os partidos integrantes do gabinete é uma forma de compensar a perda de informação inerente à delegação. Enquanto a literatura costuma focar na fase de formulação das políticas, analisando os governos formados no Brasil entre 1995 e 2014, argumento que existe um conjunto mais diversificado de estratégias que permitem aos partidos escrutinar as políticas implementadas por seus parceiros de gabinete. Fazendo uso de análise de redes e técnicas quantitativas de análise de texto (método Gibbs Sampling, algoritmo bayesiano derivado do Latent Dirichlet allocation – LDA) mostro que, nas situações em que os portfólios ministeriais são distribuídos para atores com distintas preferências sobre políticas, os partidos intensificam o uso dos Requerimentos de Informação (RIC) para monitorar os ministérios e políticas que lhes interessam. A estrutura das redes de controle intragabinete varia em função da saliência dos ministérios: os partidos responsáveis pelos portfólios com maior dotação orçamentária são os atores com maior grau de centralidade nas redes de monitoramento mútuo.


Author(s):  
Yuzhao Mao ◽  
Chang Zhou ◽  
Xiaojie Wang ◽  
Ruifan Li

Image captioning aims to generate textual descriptions for images. Most previous work generates a single-sentence description for each image. However, a picture is worth a thousand words. Single-sentence can hardly give a complete view of an image even by humans. In this paper, we propose a novel Topic-Oriented Multi-Sentence (\emph{TOMS}) captioning model, which can generate multiple topic-oriented sentences to describe an image. Different from object instances or attributes, topics mined by the latent Dirichlet allocation reflect hidden thematic structures in reference sentences of an image. In our model, each topic is integrated to a caption generator with a Fusion Gate Unit (FGU) to guide the generation of a sentence towards a certain topic perspective. With multiple sentences from different topics, our \emph{TOMS} provides a complete description of an image. Experimental results on both sentence and paragraph datasets demonstrate the effectiveness of our \emph{TOMS} in terms of topical consistency and descriptive completeness.


2021 ◽  
Vol 13 (19) ◽  
pp. 10856
Author(s):  
I-Cheng Chang ◽  
Tai-Kuei Yu ◽  
Yu-Jie Chang ◽  
Tai-Yi Yu

Facing the big data wave, this study applied artificial intelligence to cite knowledge and find a feasible process to play a crucial role in supplying innovative value in environmental education. Intelligence agents of artificial intelligence and natural language processing (NLP) are two key areas leading the trend in artificial intelligence; this research adopted NLP to analyze the research topics of environmental education research journals in the Web of Science (WoS) database during 2011–2020 and interpret the categories and characteristics of abstracts for environmental education papers. The corpus data were selected from abstracts and keywords of research journal papers, which were analyzed with text mining, cluster analysis, latent Dirichlet allocation (LDA), and co-word analysis methods. The decisions regarding the classification of feature words were determined and reviewed by domain experts, and the associated TF-IDF weights were calculated for the following cluster analysis, which involved a combination of hierarchical clustering and K-means analysis. The hierarchical clustering and LDA decided the number of required categories as seven, and the K-means cluster analysis classified the overall documents into seven categories. This study utilized co-word analysis to check the suitability of the K-means classification, analyzed the terms with high TF-IDF wights for distinct K-means groups, and examined the terms for different topics with the LDA technique. A comparison of the results demonstrated that most categories that were recognized with K-means and LDA methods were the same and shared similar words; however, two categories had slight differences. The involvement of field experts assisted with the consistency and correctness of the classified topics and documents.


Teknologi ◽  
2021 ◽  
Vol 11 (1) ◽  
pp. 16-25
Author(s):  
Alfrida Rahmawati ◽  
◽  
Najla Lailin Nikmah ◽  
Reynaldi Drajat Ageng Perwira ◽  
Nur Aini Rakhmawati ◽  
...  

The development of digital technology has brought new media, one of which is Youtube, which is now one of the most widely used applications for internet users in the world. The growth of the audience which is known as viewers, is also suported by the contribution from the content creators or also known as YouTubers from Indonesian. The more the viewers grow, the more their demand for trend content are also grwoing at surprisingly speed in one of the topics which is H-pop. In this study, the author wanted to see the dominant topics that K-pop YouTubers often upload to support content creator. This research was conducted using the Latent Dirichlet Allocation method. The analysis was carried out on after using text mining on 2563 videos from 10 K-pop YouTuber accounts with more than 100,000 subscribers. To determine the optimal number of topics by looking at the value of perplexity and topic conherence. The results obtained are the top 5 topics that are the content material in the uploaded video. These topics include reactions to dance covers, unboxing on albums and conducting reviews, riddles from K-pop dances and vlogs together to discuss about covers and reactions to sounds on K-pop songs.


2020 ◽  
pp. 016555152095467
Author(s):  
Xian Cheng ◽  
Qiang Cao ◽  
Stephen Shaoyi Liao

The unprecedented outbreak of COVID-19 is one of the most serious global threats to public health in this century. During this crisis, specialists in information science could play key roles to support the efforts of scientists in the health and medical community for combatting COVID-19. In this article, we demonstrate that information specialists can support health and medical community by applying text mining technique with latent Dirichlet allocation procedure to perform an overview of a mass of coronavirus literature. This overview presents the generic research themes of the coronavirus diseases: COVID-19, MERS and SARS, reveals the representative literature per main research theme and displays a network visualisation to explore the overlapping, similarity and difference among these themes. The overview can help the health and medical communities to extract useful information and interrelationships from coronavirus-related studies.


2016 ◽  
Vol 16 (2) ◽  
pp. 148-159
Author(s):  
Jianyong Duan ◽  
Zheng Dong ◽  
Mei Zhang

Abstract Microblog is a browser-based platform for web user’s information sharing and communication. With the rapidly increasing of microblog population, its recommendation function becomes necessary. This paper proposes the recommendation by the Latent Dirichlet Allocation topic model, which combines the user interests into the model to meet their needs. We also conduct a comparative analysis between indirect and direct recommendation algorithms. The experimental results show that the indirect recommendation is more effective for the micro-blog recommendation.


Author(s):  
LIDONG ZHAI ◽  
ZHAOYUN DING ◽  
YAN JIA ◽  
BIN ZHOU

LDA (Latent Dirichlet Allocation) proposed by Blei is a generative probabilistic model of a corpus, where documents are represented as random mixtures over latent topics, and each topic is characterized by a distribution over words, but not the attributes of word positions of every document in the corpus. In this paper, a Word Position-Related LDA Model is proposed taking into account the attributes of word positions of every document in the corpus, where each word is characterized by a distribution over word positions. At the same time, the precision of the topic-word's interpretability is improved by integrating the distribution of the word-position and the appropriate word degree, taking into account the different word degree in the different word positions. Finally, a new method, a size-aware word intrusion method is proposed to improve the ability of the topic-word's interpretability. Experimental results on the NIPS corpus show that the Word Position-Related LDA Model can improve the precision of the topic-word's interpretability. And the average improvement of the precision in the topic-word's interpretability is about 9.67%. Also, the size-aware word intrusion method can interpret the topic-word's semantic information more comprehensively and more effectively through comparing the different experimental data.


Sign in / Sign up

Export Citation Format

Share Document