A Topic Modeling Analysis of TCGA Breast and Lung Cancer Transcriptomic Data

Filippo Valle; Matteo Osella; Michele Caselle

doi:10.3390/cancers12123799

A Topic Modeling Analysis of TCGA Breast and Lung Cancer Transcriptomic Data

Cancers ◽

10.3390/cancers12123799 ◽

2020 ◽

Vol 12 (12) ◽

pp. 3799

Author(s):

Filippo Valle ◽

Matteo Osella ◽

Michele Caselle

Keyword(s):

Lung Cancer ◽

Topic Modeling ◽

Relevant Information ◽

Transcriptomic Data ◽

New Class ◽

Modeling Analysis ◽

Cancer Subtype ◽

Low Dimensional ◽

Simple Neural Network ◽

Topic Structure

Topic modeling is a widely used technique to extract relevant information from large arrays of data. The problem of finding a topic structure in a dataset was recently recognized to be analogous to the community detection problem in network theory. Leveraging on this analogy, a new class of topic modeling strategies has been introduced to overcome some of the limitations of classical methods. This paper applies these recent ideas to TCGA transcriptomic data on breast and lung cancer. The established cancer subtype organization is well reconstructed in the inferred latent topic structure. Moreover, we identify specific topics that are enriched in genes known to play a role in the corresponding disease and are strongly related to the survival probability of patients. Finally, we show that a simple neural network classifier operating in the low dimensional topic space is able to predict with high accuracy the cancer subtype of a test expression sample.

Download Full-text

A topic modelling analysis of TCGA breast and lung cancer transcriptomic data

10.1101/2020.10.19.345694 ◽

2020 ◽

Author(s):

Filippo Valle ◽

Matteo Osella ◽

Michele Caselle

Keyword(s):

Lung Cancer ◽

Relevant Information ◽

Topic Modelling ◽

Transcriptomic Data ◽

New Class ◽

Cancer Subtype ◽

Modelling Strategies ◽

Low Dimensional ◽

Simple Neural Network ◽

Topic Structure

AbstractTopic modelling is a widely used technique to extract relevant information from large arrays of data. The problem of finding a topic structure in a dataset was recently recognized to be analogous to the community detection problem in network theory. Leveraging on this analogy, a new class of topic modelling strategies has been introduced to overcome some of the limitations of classical methods. This paper applies these recent ideas to TCGA transcriptomic data on breast and lung cancer. The established cancer subtype organization is well reconstructed in the inferred latent topic structure. Moreover, we identify specific topics that are enriched in genes known to play a role in the corresponding disease and are strongly related to the survival probability of patients. Finally, we show that a simple neural network classifier operating in the low dimensional topic space is able to predict with high accuracy the cancer subtype of a test expression sample.

Download Full-text

A topic modeling analysis of Korea’s T&I research trends in the 2010s

Babel ◽

10.1075/babel.00228.lee ◽

2021 ◽

Author(s):

Changsoo Lee

Keyword(s):

Topic Modeling ◽

Research Output ◽

Research Trends ◽

Korean Language ◽

Modeling Analysis ◽

Trends Over Time ◽

New Research ◽

Topic Structure ◽

Manual Classification

Abstract The present study aims to demonstrate the relevance of topic modeling as a new research tool for analyzing research trends in the T&I field. Until now, most efforts to this end have relied on manual classification based on pre-established typologies. This method is time- and labor-consuming, prone to subjective biases, and limited in describing a vast amount of research output. As a key component of text mining, topic modeling offers an efficient way of summarizing topic structure and trends over time in a collection of documents while being able to describe the entire system without having to rely on sampling. As a case study, the present paper applies the technique to analyzing a collection of abstracts from four Korean Language T&I journals for the 2010s decade (from 2010 to 2019). The analysis proves the technique to be highly successful in uncovering hidden topical structure and trends in the abstract corpus. The results are discussed along with implications of the technique for the T&I field.

Download Full-text

A Topic Modeling Analysis on the Major Social Issues of the Students’ Human Rights Ordinance in Korea

Asian Journal of Education ◽

10.15753/aje.2017.12.18.4.683 ◽

2017 ◽

Vol 18 (4) ◽

pp. 683-711

Author(s):

Hyun-Jeong Park ◽

Hanna Kim ◽

YuJung Hong

Keyword(s):

Human Rights ◽

Topic Modeling ◽

Social Issues ◽

Modeling Analysis

Download Full-text

A Topic Modeling Analysis of the News Topic on the 4th Industrial Revolution in Korea: Focusing on the Difference by Media Type and Each Major Period

Journal of Cybercommunication Academic Society ◽

10.36494/jcas.2019.06.36.2.173 ◽

2019 ◽

Vol 36 (2) ◽

pp. 173-219 ◽

Cited By ~ 1

Author(s):

Jin-Ho Choi ◽

Hae-Soo Lee ◽

Eun-Hyeong Jin

Keyword(s):

Topic Modeling ◽

Industrial Revolution ◽

Media Type ◽

Modeling Analysis ◽

Major Period ◽

The Difference

Download Full-text

Improving Topic Coherence Using Entity Extraction Denoising

Prague Bulletin of Mathematical Linguistics ◽

10.2478/pralin-2018-0004 ◽

2018 ◽

Vol 110 (1) ◽

pp. 85-101 ◽

Cited By ~ 1

Author(s):

Ronald Cardenas ◽

Kevin Bello ◽

Alberto Coronado ◽

Elizabeth Villota

Keyword(s):

Topic Modeling ◽

Human Perception ◽

Relevant Information ◽

Entity Recognition ◽

Entity Extraction ◽

Fine Grained ◽

Job Advertisement ◽

Coherence Score ◽

Probabilistic Topic Modeling ◽

Promising Solution

Abstract Managing large collections of documents is an important problem for many areas of science, industry, and culture. Probabilistic topic modeling offers a promising solution. Topic modeling is an unsupervised machine learning method and the evaluation of this model is an interesting problem on its own. Topic interpretability measures have been developed in recent years as a more natural option for topic quality evaluation, emulating human perception of coherence with word sets correlation scores. In this paper, we show experimental evidence of the improvement of topic coherence score by restricting the training corpus to that of relevant information in the document obtained by Entity Recognition. We experiment with job advertisement data and find that with this approach topic models improve interpretability in about 40 percentage points on average. Our analysis reveals as well that using the extracted text chunks, some redundant topics are joined while others are split into more skill-specific topics. Fine-grained topics observed in models using the whole text are preserved.

Download Full-text

State Legislators’ Divergent Social Media Response to the Opioid Epidemic from 2014 to 2019: Longitudinal Topic Modeling Analysis

Journal of General Internal Medicine ◽

10.1007/s11606-021-06678-9 ◽

2021 ◽

Author(s):

Daniel C. Stokes ◽

Jonathan Purtle ◽

Zachary F. Meisel ◽

Anish K. Agarwal

Keyword(s):

Social Media ◽

Topic Modeling ◽

State Legislators ◽

Opioid Epidemic ◽

Modeling Analysis

Download Full-text

Discovery of a new class of valosine containing protein (VCP/P97) inhibitors for the treatment of non-small cell lung cancer

Bioorganic & Medicinal Chemistry ◽

10.1016/j.bmc.2018.12.036 ◽

2019 ◽

Vol 27 (3) ◽

pp. 533-544 ◽

Cited By ~ 2

Author(s):

Xueyuan Wang ◽

Enhe Bai ◽

Hui Zhou ◽

Sijia Sha ◽

Hang Miao ◽

...

Keyword(s):

Lung Cancer ◽

Small Cell Lung Cancer ◽

Cell Lung Cancer ◽

Small Cell ◽

Small Cell Lung ◽

New Class

Download Full-text

CHAOS AND HYPERCHAOS IN A CLASS OF SIMPLE CELLULAR NEURAL NETWORKS MODELED BY O.D.E.

International Journal of Bifurcation and Chaos ◽

10.1142/s0218127406016409 ◽

2006 ◽

Vol 16 (09) ◽

pp. 2729-2736 ◽

Cited By ~ 5

Author(s):

XIAO-SONG YANG ◽

YAN HUANG

Keyword(s):

Neural Networks ◽

Differential Equations ◽

Ordinary Differential Equations ◽

Lyapunov Exponents ◽

Cellular Neural Networks ◽

New Class ◽

Connection Matrices ◽

Low Dimensional

This paper presents a new class of chaotic and hyperchaotic low dimensional cellular neural networks modeled by ordinary differential equations with some simple connection matrices. The chaoticity of these neural networks is indicated by positive Lyapunov exponents calculated by a computer.

Download Full-text

Faculty Opinions recommendation of Whole-exome sequencing reveals germline-mutated small cell lung cancer subtype with favorable response to DNA repair-targeted therapies.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.739435020.793584369 ◽

2021 ◽

Author(s):

Julien Sage

Keyword(s):

Lung Cancer ◽

Dna Repair ◽

Small Cell Lung Cancer ◽

Exome Sequencing ◽

Whole Exome Sequencing ◽

Targeted Therapies ◽

Small Cell ◽

Small Cell Lung ◽

Whole Exome ◽

Cancer Subtype

Download Full-text

Topic Modeling Analysis of Questions of Participants in Child Care Consulting of Young Children with Disabilities Based on Video Meeting Platform

Korean Journal of Early Childhood Special Education ◽

10.21214/kecse.2021.21.2.103 ◽

2021 ◽

Vol 21 (2) ◽

pp. 103-135

Author(s):

Eunhee Ma

Keyword(s):

Child Care ◽

Young Children ◽

Topic Modeling ◽

Children With Disabilities ◽

Modeling Analysis ◽

Young Children With Disabilities ◽

Video Meeting

Download Full-text