scholarly journals Multi-Label Classification of Text Documents using Probabilistic Topic Modeling

2016 ◽  
Vol 4 (47) ◽  
pp. 92
Author(s):  
Sergey Nikolaevich Karpovich
2015 ◽  
Vol 16 (S6) ◽  
Author(s):  
Massimo La Rosa ◽  
Antonino Fiannaca ◽  
Riccardo Rizzo ◽  
Alfonso Urso

Author(s):  
Maryam Nuser ◽  
Enas Al-Horani

The number of digital medical documents is increasing continuously; several medical websites share a lot of unclassified articles. These articles have very long texts that should be read to determine the topic of each document. The classification of these documents is important so researchers can use these documents easily and the effort and time in reading and searching for a specific topic will be reduced. Therefore, an automatic way to extract latent topics from these text documents is needed. Topic modeling is one of the techniques used to deal with this problem. In this paper, a medical collection of documents is used; this collection contains documents from three types of widespread diseases (Heart Diseases, Blood Pressure and Cholesterol). LDA topic modeling technique is applied to classify these documents into the previous mentioned topics. An evaluation of the algorithm’s results is done and the LDA shows a good level of classification accuracy.


2018 ◽  
Vol 110 (1) ◽  
pp. 85-101 ◽  
Author(s):  
Ronald Cardenas ◽  
Kevin Bello ◽  
Alberto Coronado ◽  
Elizabeth Villota

Abstract Managing large collections of documents is an important problem for many areas of science, industry, and culture. Probabilistic topic modeling offers a promising solution. Topic modeling is an unsupervised machine learning method and the evaluation of this model is an interesting problem on its own. Topic interpretability measures have been developed in recent years as a more natural option for topic quality evaluation, emulating human perception of coherence with word sets correlation scores. In this paper, we show experimental evidence of the improvement of topic coherence score by restricting the training corpus to that of relevant information in the document obtained by Entity Recognition. We experiment with job advertisement data and find that with this approach topic models improve interpretability in about 40 percentage points on average. Our analysis reveals as well that using the extracted text chunks, some redundant topics are joined while others are split into more skill-specific topics. Fine-grained topics observed in models using the whole text are preserved.


Author(s):  
M. Selvi ◽  
K. Thangaramya ◽  
M. S. Saranya ◽  
K. Kulothungan ◽  
S. Ganapathy ◽  
...  

Energies ◽  
2021 ◽  
Vol 14 (5) ◽  
pp. 1497
Author(s):  
Chankook Park ◽  
Minkyu Kim

It is important to examine in detail how the distribution of academic research topics related to renewable energy is structured and which topics are likely to receive new attention in the future in order for scientists to contribute to the development of renewable energy. This study uses an advanced probabilistic topic modeling to statistically examine the temporal changes of renewable energy topics by using academic abstracts from 2010–2019 and explores the properties of the topics from the perspective of future signs such as weak signals. As a result, in strong signals, methods for optimally integrating renewable energy into the power grid are paid great attention. In weak signals, interest in large-capacity energy storage systems such as hydrogen, supercapacitors, and compressed air energy storage showed a high rate of increase. In not-strong-but-well-known signals, comprehensive topics have been included, such as renewable energy potential, barriers, and policies. The approach of this study is applicable not only to renewable energy but also to other subjects.


Sign in / Sign up

Export Citation Format

Share Document