Deep Subspace Mutual Learning For Cancer Subtypes Prediction

Bioinformatics ◽

10.1093/bioinformatics/btab625 ◽

2021 ◽

Author(s):

Bo Yang ◽

Ting-Ting Xin ◽

Shan-Min Pang ◽

Meng Wang ◽

Yi-Jie Wang

Keyword(s):

The Cancer Genome Atlas ◽

Supplementary Information ◽

Omics Data ◽

Computational Framework ◽

Disease Etiology ◽

Cancer Subtypes ◽

Mutual Learning ◽

Cancer Genome Atlas ◽

Precise Prediction ◽

Multi Level

Abstract Motivation Precise prediction of cancer subtypes is of significant importance in cancer diagnosis and treatment. Disease etiology is complicated existing at different omics levels, hence integrative analysis provides a very effective way to improve our understanding of cancer. Results We propose a novel computational framework, named Deep Subspace Mutual Learning (DSML). DSML has the capability to simultaneously learn the subspace structures in each available omics data and in overall multi-omics data by adopting deep neural networks, which thereby facilitates the subtypes prediction via clustering on multi-level, single level, and partial level omics data. Extensive experiments are performed in five different cancers on three levels of omics data from The Cancer Genome Atlas. The experimental analysis demonstrates that DSML delivers comparable or even better results than many state-of-the-art integrative methods. Availability An implementation and documentation of the DSML is publicly available at https://github.com/polytechnicXTT/Deep-Subspace-Mutual-Learning.git. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Novel cancer subtyping method based on patient-specific gene regulatory network

10.1101/2021.03.24.436731 ◽

2021 ◽

Author(s):

Mai Adachi Nakazawa ◽

Yoshinori Tamada ◽

Yoshihisa Tanaka ◽

Marie Ikeguchi ◽

Kako Higashihara ◽

...

Keyword(s):

Gene Networks ◽

Regulatory Networks ◽

The Cancer Genome Atlas ◽

Patient Specific ◽

Specific Gene ◽

Omics Data ◽

Cancer Subtypes ◽

Molecular Systems ◽

Molecular Features ◽

Cancer Genome Atlas

The identification of cancer subtypes is important for the understanding of tumor heterogeneity. In recent years, numerous computational methods have been proposed for this problem based on the multi-omics data of patients. It is widely accepted that different cancer subtypes are induced by different molecular regulatory networks. However, only a few incorporate the differences between their molecular systems into the classification processes. In this study, we present a novel method to classify cancer subtypes based on patient-specific molecular systems. Our method quantifies patient-specific gene networks, which are estimated from their transcriptome data. By clustering their quantified networks, our method allows for cancer subtyping, taking into consideration the differences in the molecular systems of patients. Comprehensive analyses of The Cancer Genome Atlas (TCGA) datasets applied to our method confirmed that they were able to identify more clinically meaningful cancer subtypes than the existing subtypes and found that the identified subtypes comprised different molecular features. Our findings show that the proposed method, based on a simple classification using the patient-specific molecular systems, can identify cancer subtypes even with single omics data, which cannot otherwise be captured by existing methods using multi-omics data.

Download Full-text

Using association signal annotations to boost similarity network fusion

Bioinformatics ◽

10.1093/bioinformatics/btz124 ◽

2019 ◽

Vol 35 (19) ◽

pp. 3718-3726 ◽

Cited By ~ 5

Author(s):

Peifeng Ruan ◽

Ya Wang ◽

Ronglai Shen ◽

Shuang Wang

Keyword(s):

Similarity Measures ◽

R Package ◽

The Cancer Genome Atlas ◽

Supplementary Information ◽

Omics Data ◽

Similarity Network ◽

Signal Features ◽

Cancer Genome Atlas ◽

Cancer Types ◽

Similarity Networks

Abstract Motivation Recent technology developments have made it possible to generate various kinds of omics data, which provides opportunities to better solve problems such as disease subtyping or disease mapping using more comprehensive omics data jointly. Among many developed data-integration methods, the similarity network fusion (SNF) method has shown a great potential to identify new disease subtypes through separating similar subjects using multi-omics data. SNF effectively fuses similarity networks with pairwise patient similarity measures from different types of omics data into one fused network using both shared and complementary information across multiple types of omics data. Results In this article, we proposed an association-signal-annotation boosted similarity network fusion (ab-SNF) method, adding feature-level association signal annotations as weights aiming to up-weight signal features and down-weight noise features when constructing subject similarity networks to boost the performance in disease subtyping. In various simulation studies, the proposed ab-SNF outperforms the original SNF approach without weights. Most importantly, the improvement in the subtyping performance due to association-signal-annotation weights is amplified in the integration process. Applications to somatic mutation data, DNA methylation data and gene expression data of three cancer types from The Cancer Genome Atlas project suggest that the proposed ab-SNF method consistently identifies new subtypes in each cancer that more accurately predict patient survival and are more biologically meaningful. Availability and implementation The R package abSNF is freely available for downloading from https://github.com/pfruan/abSNF. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

An R Package for Divergence Analysis of Omics Data

10.1101/720391 ◽

2019 ◽

Author(s):

Wikum Dinalankara ◽

Qian Ke ◽

Donald Geman ◽

Luigi Marchionni

Keyword(s):

High Throughput Sequencing ◽

R Package ◽

The Cancer Genome Atlas ◽

High Dimensional ◽

Omics Data ◽

Sequencing Data ◽

High Throughput Sequencing Data ◽

Ternary Code ◽

Cancer Genome Atlas ◽

Level Analysis

AbstractGiven the ever-increasing amount of high-dimensional and complex omics data becoming available, it is increasingly important to discover simple but effective methods of analysis. Divergence analysis transforms each entry of a high-dimensional omics profile into a digitized (binary or ternary) code based on the deviation of the entry from a given baseline population. This is a novel framework that is significantly different from existing omics data analysis methods: it allows digitization of continuous omics data at the univariate or multivariate level, facilitates sample level analysis, and is applicable on many different omics platforms. The divergence package, available on the R platform through the Bioconductor repository collection, provides easy-to-use functions for carrying out this transformation. Here we demonstrate how to use the package with sample high throughput sequencing data from the Cancer Genome Atlas.

Download Full-text

An R package for divergence analysis of omics data

PLoS ONE ◽

10.1371/journal.pone.0249002 ◽

2021 ◽

Vol 16 (4) ◽

pp. e0249002

Author(s):

Wikum Dinalankara ◽

Qian Ke ◽

Donald Geman ◽

Luigi Marchionni

Keyword(s):

R Package ◽

The Cancer Genome Atlas ◽

High Dimensional ◽

Omics Data ◽

Ternary Code ◽

Cancer Genome Atlas ◽

Level Analysis ◽

Data Analysis Methods ◽

Genome Atlas ◽

Omics Data Analysis

Given the ever-increasing amount of high-dimensional and complex omics data becoming available, it is increasingly important to discover simple but effective methods of analysis. Divergence analysis transforms each entry of a high-dimensional omics profile into a digitized (binary or ternary) code based on the deviation of the entry from a given baseline population. This is a novel framework that is significantly different from existing omics data analysis methods: it allows digitization of continuous omics data at the univariate or multivariate level, facilitates sample level analysis, and is applicable on many different omics platforms. The divergence package, available on the R platform through the Bioconductor repository collection, provides easy-to-use functions for carrying out this transformation. Here we demonstrate how to use the package with data from the Cancer Genome Atlas.

Download Full-text

Integrated Dissection of lncRNA-Perturbated Triplets Reveals Novel Prognostic Signatures Across Cancer Types

International Journal of Molecular Sciences ◽

10.3390/ijms21176087 ◽

2020 ◽

Vol 21 (17) ◽

pp. 6087

Author(s):

Yunzhen Wei ◽

Limeng Zhou ◽

Yingzhang Huang ◽

Dianjing Guo

Keyword(s):

Cancer Biology ◽

Noncoding Rna ◽

The Cancer Genome Atlas ◽

Dynamic Changes ◽

Computational Framework ◽

Potential Biomarker ◽

Cancer Genome Atlas ◽

Cancer Types ◽

Tcga Dataset ◽

Pan Cancer

Long noncoding RNA (lncRNA)/microRNA(miRNA)/mRNA triplets contribute to cancer biology. However, identifying significative triplets remains a major challenge for cancer research. The dynamic changes among factors of the triplets have been less understood. Here, by integrating target information and expression datasets, we proposed a novel computational framework to identify the triplets termed as “lncRNA-perturbated triplets”. We applied the framework to five cancer datasets in The Cancer Genome Atlas (TCGA) project and identified 109 triplets. We showed that the paired miRNAs and mRNAs were widely perturbated by lncRNAs in different cancer types. LncRNA perturbators and lncRNA-perturbated mRNAs showed significantly higher evolutionary conservation than other lncRNAs and mRNAs. Importantly, the lncRNA-perturbated triplets exhibited high cancer specificity. The pan-cancer perturbator OIP5-AS1 had higher expression level than that of the cancer-specific perturbators. These lncRNA perturbators were significantly enriched in known cancer-related pathways. Furthermore, among the 25 lncRNA in the 109 triplets, lncRNA SNHG7 was identified as a stable potential biomarker in lung adenocarcinoma (LUAD) by combining the TCGA dataset and two independent GEO datasets. Results from cell transfection also indicated that overexpression of lncRNA SNHG7 and TUG1 enhanced the expression of the corresponding mRNA PNMA2 and CDC7 in LUAD. Our study provides a systematic dissection of lncRNA-perturbated triplets and facilitates our understanding of the molecular roles of lncRNAs in cancers.

Download Full-text

IsoformSwitchAnalyzeR: analysis of changes in genome-wide patterns of alternative splicing and its functional consequences

Bioinformatics ◽

10.1093/bioinformatics/btz247 ◽

2019 ◽

Vol 35 (21) ◽

pp. 4469-4471 ◽

Cited By ~ 21

Author(s):

Kristoffer Vitting-Seerup ◽

Albin Sandelin

Keyword(s):

Alternative Splicing ◽

The Cancer Genome Atlas ◽

Supplementary Information ◽

Rna Seq ◽

Genome Wide ◽

Functional Consequences ◽

Cancer Genome Atlas ◽

Health And Disease ◽

Splicing Patterns

Abstract Summary Alternative splicing is an important mechanism involved in health and disease. Recent work highlights the importance of investigating genome-wide changes in splicing patterns and the subsequent functional consequences. Current computational methods only support such analysis on a gene-by-gene basis. Therefore, we extended IsoformSwitchAnalyzeR R library to enable analysis of genome-wide changes in specific types of alternative splicing and predicted functional consequences of the resulting isoform switches. As a case study, we analyzed RNA-seq data from The Cancer Genome Atlas and found systematic changes in alternative splicing and the consequences of the associated isoform switches. Availability and implementation Windows, Linux and Mac OS: http://bioconductor.org/packages/IsoformSwitchAnalyzeR. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Cis-Compound Mutations are Prevalent in Triple Negative Breast Cancer and Can Drive Tumor Progression

10.1101/085316 ◽

2016 ◽

Author(s):

Nao Hiranuma ◽

Jie Liu ◽

Chaozhong Song ◽

Jacob Goldsmith ◽

Michael Dorschner ◽

...

Keyword(s):

Breast Cancer ◽

Cancer Progression ◽

Triple Negative ◽

The Cancer Genome Atlas ◽

Breast Cancers ◽

Single Nucleotide Variants ◽

Primary Tumors ◽

Cancer Subtypes ◽

Multiple Biopsies ◽

Cancer Genome Atlas

About 16% of breast cancers fall into a clinically aggressive category designated triple negative (TNBC) due to a lack of ERBB2, estrogen receptor and progesterone receptor expression1-3. The mutational spectrum of TNBC has been characterized as part of The Cancer Genome Atlas (TCGA)4; however, snapshots of primary tumors cannot reveal the mechanisms by which TNBCs progress and spread. To address this limitation we initiated the Intensive Trial of OMics in Cancer (ITOMIC)-001, in which patients with metastatic TNBC undergo multiple biopsies over space and time5. Whole exome sequencing (WES) of 67 samples from 11 patients identified 426 genes containing multiple distinct single nucleotide variants (SNVs) within the same sample, instances we term Multiple SNVs affecting the Same Gene and Sample (MSSGS). We find that >90% of MSSGS result from cis-compound mutations (in which both SNVs affect the same allele), that MSSGS comprised of SNVs affecting adjacent nucleotides arise from single mutational events, and that most other MSSGS result from the sequential acquisition of SNVs. Some MSSGS drive cancer progression, as exemplified by a TNBC driven by FGFR2(S252W;Y375C). MSSGS are more prevalent in TNBC than other breast cancer subtypes and occur at higher-than-expected frequencies across TNBC samples within TCGA. MSSGS may denote genes that play as yet unrecognized roles in cancer progression.

Download Full-text

CaPSSA: visual evaluation of cancer biomarker genes for patient stratification and survival analysis using mutation and expression data

Bioinformatics ◽

10.1093/bioinformatics/btz516 ◽

2019 ◽

Vol 35 (24) ◽

pp. 5341-5343 ◽

Cited By ~ 4

Author(s):

Yeongjun Jang ◽

Jihae Seo ◽

Insu Jang ◽

Byungwook Lee ◽

Sun Kim ◽

...

Keyword(s):

Survival Analysis ◽

Web Application ◽

Predictive Biomarkers ◽

Primary Source ◽

The Cancer Genome Atlas ◽

Supplementary Information ◽

Molecular Characteristics ◽

Omics Data ◽

Patient Stratification ◽

Visual Evaluation

AbstractSummaryPredictive biomarkers for patient stratification play critical roles in realizing the paradigm of precision medicine. Molecular characteristics such as somatic mutations and expression signatures represent the primary source of putative biomarker genes for patient stratification. However, evaluation of such candidate biomarkers is still cumbersome and requires multistep procedures especially when using massive public omics data. Here, we present an interactive web application that divides patients from large cohorts (e.g. The Cancer Genome Atlas, TCGA) dynamically into two groups according to the mutation, copy number variation or gene expression of query genes. It further supports users to examine the prognostic value of resulting patient groups based on survival analysis and their association with the clinical features as well as the previously annotated molecular subtypes, facilitated with a rich and interactive visualization. Importantly, we also support custom omics data with clinical information.Availability and implementationCaPSSA (Cancer Patient Stratification and Survival Analysis) runs on a web-browser and is freely available without restrictions at http://www.kobic.re.kr/capssa/. The source code is available on https://github.com/yjjang/capssa.Supplementary informationSupplementary data are available at Bioinformatics online.

Download Full-text

HotSpot3D web server: an integrated resource for mutation analysis in protein 3D structures

Bioinformatics ◽

10.1093/bioinformatics/btaa258 ◽

2020 ◽

Vol 36 (12) ◽

pp. 3944-3946 ◽

Cited By ~ 2

Author(s):

Shanyu Chen ◽

Xiaoyu He ◽

Ruilin Li ◽

Xiaohong Duan ◽

Beifang Niu

Keyword(s):

Mutation Analysis ◽

Web Server ◽

Cancer Genome ◽

The Cancer Genome Atlas ◽

Supplementary Information ◽

Supplementary Data ◽

3D Structures ◽

One Stop ◽

Cancer Genome Atlas ◽

Genome Atlas

Abstract Motivation HotSpot3D is a widely used software for identifying mutation hotspots on the 3D structures of proteins. To further assist users, we developed a new HotSpot3D web server to make this software more versatile, convenient and interactive. Results The HotSpot3D web server performs data pre-processing, clustering, visualization and log-viewing on one stop. Users can interactively explore each cluster and easily re-visualize the mutational clusters within browsers. We also provide a database that allows users to search and visualize proximal mutations from 33 cancers in the Cancer Genome Atlas. Availability and implementation http://niulab.scgrid.cn/HotSpot3D/. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Modelling cancer progression using Mutual Hazard Networks

Bioinformatics ◽

10.1093/bioinformatics/btz513 ◽

2019 ◽

Vol 36 (1) ◽

pp. 241-249 ◽

Cited By ~ 2

Author(s):

Rudolf Schill ◽

Stefan Solbrig ◽

Tilo Wettig ◽

Rainer Spang

Keyword(s):

Cancer Progression ◽

Learning Algorithm ◽

Directed Acyclic Graphs ◽

The Cancer Genome Atlas ◽

Supplementary Information ◽

Cross Sectional ◽

Acyclic Graphs ◽

Cancer Genome Atlas ◽

Occurrence State ◽

Occurrence Patterns

Abstract Motivation Cancer progresses by accumulating genomic events, such as mutations and copy number alterations, whose chronological order is key to understanding the disease but difficult to observe. Instead, cancer progression models use co-occurrence patterns in cross-sectional data to infer epistatic interactions between events and thereby uncover their most likely order of occurrence. State-of-the-art progression models, however, are limited by mathematical tractability and only allow events to interact in directed acyclic graphs, to promote but not inhibit subsequent events, or to be mutually exclusive in distinct groups that cannot overlap. Results Here we propose Mutual Hazard Networks (MHN), a new Machine Learning algorithm to infer cyclic progression models from cross-sectional data. MHN model events by their spontaneous rate of fixation and by multiplicative effects they exert on the rates of successive events. MHN compared favourably to acyclic models in cross-validated model fit on four datasets tested. In application to the glioblastoma dataset from The Cancer Genome Atlas, MHN proposed a novel interaction in line with consecutive biopsies: IDH1 mutations are early events that promote subsequent fixation of TP53 mutations. Availability and implementation Implementation and data are available at https://github.com/RudiSchill/MHN. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text