scholarly journals A Topic Modeling Analysis of TCGA Breast and Lung Cancer Transcriptomic Data

Cancers ◽  
2020 ◽  
Vol 12 (12) ◽  
pp. 3799
Author(s):  
Filippo Valle ◽  
Matteo Osella ◽  
Michele Caselle

Topic modeling is a widely used technique to extract relevant information from large arrays of data. The problem of finding a topic structure in a dataset was recently recognized to be analogous to the community detection problem in network theory. Leveraging on this analogy, a new class of topic modeling strategies has been introduced to overcome some of the limitations of classical methods. This paper applies these recent ideas to TCGA transcriptomic data on breast and lung cancer. The established cancer subtype organization is well reconstructed in the inferred latent topic structure. Moreover, we identify specific topics that are enriched in genes known to play a role in the corresponding disease and are strongly related to the survival probability of patients. Finally, we show that a simple neural network classifier operating in the low dimensional topic space is able to predict with high accuracy the cancer subtype of a test expression sample.

2020 ◽  
Author(s):  
Filippo Valle ◽  
Matteo Osella ◽  
Michele Caselle

AbstractTopic modelling is a widely used technique to extract relevant information from large arrays of data. The problem of finding a topic structure in a dataset was recently recognized to be analogous to the community detection problem in network theory. Leveraging on this analogy, a new class of topic modelling strategies has been introduced to overcome some of the limitations of classical methods. This paper applies these recent ideas to TCGA transcriptomic data on breast and lung cancer. The established cancer subtype organization is well reconstructed in the inferred latent topic structure. Moreover, we identify specific topics that are enriched in genes known to play a role in the corresponding disease and are strongly related to the survival probability of patients. Finally, we show that a simple neural network classifier operating in the low dimensional topic space is able to predict with high accuracy the cancer subtype of a test expression sample.


Babel ◽  
2021 ◽  
Author(s):  
Changsoo Lee

Abstract The present study aims to demonstrate the relevance of topic modeling as a new research tool for analyzing research trends in the T&I field. Until now, most efforts to this end have relied on manual classification based on pre-established typologies. This method is time- and labor-consuming, prone to subjective biases, and limited in describing a vast amount of research output. As a key component of text mining, topic modeling offers an efficient way of summarizing topic structure and trends over time in a collection of documents while being able to describe the entire system without having to rely on sampling. As a case study, the present paper applies the technique to analyzing a collection of abstracts from four Korean Language T&I journals for the 2010s decade (from 2010 to 2019). The analysis proves the technique to be highly successful in uncovering hidden topical structure and trends in the abstract corpus. The results are discussed along with implications of the technique for the T&I field.


2018 ◽  
Vol 110 (1) ◽  
pp. 85-101 ◽  
Author(s):  
Ronald Cardenas ◽  
Kevin Bello ◽  
Alberto Coronado ◽  
Elizabeth Villota

Abstract Managing large collections of documents is an important problem for many areas of science, industry, and culture. Probabilistic topic modeling offers a promising solution. Topic modeling is an unsupervised machine learning method and the evaluation of this model is an interesting problem on its own. Topic interpretability measures have been developed in recent years as a more natural option for topic quality evaluation, emulating human perception of coherence with word sets correlation scores. In this paper, we show experimental evidence of the improvement of topic coherence score by restricting the training corpus to that of relevant information in the document obtained by Entity Recognition. We experiment with job advertisement data and find that with this approach topic models improve interpretability in about 40 percentage points on average. Our analysis reveals as well that using the extracted text chunks, some redundant topics are joined while others are split into more skill-specific topics. Fine-grained topics observed in models using the whole text are preserved.


2019 ◽  
Vol 27 (3) ◽  
pp. 533-544 ◽  
Author(s):  
Xueyuan Wang ◽  
Enhe Bai ◽  
Hui Zhou ◽  
Sijia Sha ◽  
Hang Miao ◽  
...  

2006 ◽  
Vol 16 (09) ◽  
pp. 2729-2736 ◽  
Author(s):  
XIAO-SONG YANG ◽  
YAN HUANG

This paper presents a new class of chaotic and hyperchaotic low dimensional cellular neural networks modeled by ordinary differential equations with some simple connection matrices. The chaoticity of these neural networks is indicated by positive Lyapunov exponents calculated by a computer.


Sign in / Sign up

Export Citation Format

Share Document