Multi-Label Classification of Text Documents using Probabilistic Topic Modeling

Sergey Nikolaevich Karpovich

doi:10.15622/sp.47.5

Probabilistic topic modeling for the analysis and classification of genomic sequences

BMC Bioinformatics ◽

10.1186/1471-2105-16-s6-s2 ◽

2015 ◽

Vol 16 (S6) ◽

Cited By ~ 23

Author(s):

Massimo La Rosa ◽

Antonino Fiannaca ◽

Riccardo Rizzo ◽

Alfonso Urso

Keyword(s):

Topic Modeling ◽

Genomic Sequences ◽

Probabilistic Topic Modeling

Download Full-text

Medical documents classification using topic modeling

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v17.i3.pp1524-1530 ◽

2020 ◽

Vol 17 (3) ◽

pp. 1524

Author(s):

Maryam Nuser ◽

Enas Al-Horani

Keyword(s):

Blood Pressure ◽

Classification Accuracy ◽

Topic Modeling ◽

Heart Diseases ◽

Modeling Technique ◽

Specific Topic ◽

Text Documents ◽

Medical Documents ◽

Latent Topics

The number of digital medical documents is increasing continuously; several medical websites share a lot of unclassified articles. These articles have very long texts that should be read to determine the topic of each document. The classification of these documents is important so researchers can use these documents easily and the effort and time in reading and searching for a specific topic will be reduced. Therefore, an automatic way to extract latent topics from these text documents is needed. Topic modeling is one of the techniques used to deal with this problem. In this paper, a medical collection of documents is used; this collection contains documents from three types of widespread diseases (Heart Diseases, Blood Pressure and Cholesterol). LDA topic modeling technique is applied to classify these documents into the previous mentioned topics. An evaluation of the algorithm’s results is done and the LDA shows a good level of classification accuracy.

Download Full-text

Improving Topic Coherence Using Entity Extraction Denoising

Prague Bulletin of Mathematical Linguistics ◽

10.2478/pralin-2018-0004 ◽

2018 ◽

Vol 110 (1) ◽

pp. 85-101 ◽

Cited By ~ 1

Author(s):

Ronald Cardenas ◽

Kevin Bello ◽

Alberto Coronado ◽

Elizabeth Villota

Keyword(s):

Topic Modeling ◽

Human Perception ◽

Relevant Information ◽

Entity Recognition ◽

Entity Extraction ◽

Fine Grained ◽

Job Advertisement ◽

Coherence Score ◽

Probabilistic Topic Modeling ◽

Promising Solution

Abstract Managing large collections of documents is an important problem for many areas of science, industry, and culture. Probabilistic topic modeling offers a promising solution. Topic modeling is an unsupervised machine learning method and the evaluation of this model is an interesting problem on its own. Topic interpretability measures have been developed in recent years as a more natural option for topic quality evaluation, emulating human perception of coherence with word sets correlation scores. In this paper, we show experimental evidence of the improvement of topic coherence score by restricting the training corpus to that of relevant information in the document obtained by Entity Recognition. We experiment with job advertisement data and find that with this approach topic models improve interpretability in about 40 percentage points on average. Our analysis reveals as well that using the extracted text chunks, some redundant topics are joined while others are split into more skill-specific topics. Fine-grained topics observed in models using the whole text are preserved.

Download Full-text

Classification of Medical Dataset Along with Topic Modeling Using LDA

Nanoelectronics, Circuits and Communication Systems - Lecture Notes in Electrical Engineering ◽

10.1007/978-981-13-0776-8_1 ◽

2018 ◽

pp. 1-11 ◽

Cited By ~ 2

Author(s):

M. Selvi ◽

K. Thangaramya ◽

M. S. Saranya ◽

K. Kulothungan ◽

S. Ganapathy ◽

...

Keyword(s):

Topic Modeling ◽

Medical Dataset

Download Full-text

A Study on the Characteristics of Academic Topics Related to Renewable Energy Using the Structural Topic Modeling and the Weak Signal Concept

Energies ◽

10.3390/en14051497 ◽

2021 ◽

Vol 14 (5) ◽

pp. 1497

Author(s):

Chankook Park ◽

Minkyu Kim

Keyword(s):

Renewable Energy ◽

Energy Storage ◽

Topic Modeling ◽

Academic Research ◽

High Rate ◽

Weak Signals ◽

Energy Potential ◽

Rate Of Increase ◽

Probabilistic Topic Modeling ◽

To Receive

It is important to examine in detail how the distribution of academic research topics related to renewable energy is structured and which topics are likely to receive new attention in the future in order for scientists to contribute to the development of renewable energy. This study uses an advanced probabilistic topic modeling to statistically examine the temporal changes of renewable energy topics by using academic abstracts from 2010–2019 and explores the properties of the topics from the perspective of future signs such as weak signals. As a result, in strong signals, methods for optimally integrating renewable energy into the power grid are paid great attention. In weak signals, interest in large-capacity energy storage systems such as hydrogen, supercapacitors, and compressed air energy storage showed a high rate of increase. In not-strong-but-well-known signals, comprehensive topics have been included, such as renewable energy potential, barriers, and policies. The approach of this study is applicable not only to renewable energy but also to other subjects.

Download Full-text