Evaluation of integrative clustering methods for the analysis of multi-omics data

Cécile Chauvel; Alexei Novoloaca; Pierre Veyre; Frédéric Reynier; Jérémie Becker

doi:10.1093/bib/bbz015

Evaluation of integrative clustering methods for the analysis of multi-omics data

Briefings in Bioinformatics ◽

10.1093/bib/bbz015 ◽

2019 ◽

Vol 21 (2) ◽

pp. 541-552 ◽

Cited By ~ 9

Author(s):

Cécile Chauvel ◽

Alexei Novoloaca ◽

Pierre Veyre ◽

Frédéric Reynier ◽

Jérémie Becker

Keyword(s):

Matrix Factorization ◽

Large Scale ◽

The Cancer Genome Atlas ◽

Added Value ◽

Joint Analysis ◽

Omics Data ◽

Clustering Methods ◽

Data Set ◽

Cancer Data ◽

Opposite Behavior

Abstract Recent advances in sequencing, mass spectrometry and cytometry technologies have enabled researchers to collect large-scale omics data from the same set of biological samples. The joint analysis of multiple omics offers the opportunity to uncover coordinated cellular processes acting across different omic layers. In this work, we present a thorough comparison of a selection of recent integrative clustering approaches, including Bayesian (BCC and MDI) and matrix factorization approaches (iCluster, moCluster, JIVE and iNMF). Based on simulations, the methods were evaluated on their sensitivity and their ability to recover both the correct number of clusters and the simulated clustering at the common and data-specific levels. Standard non-integrative approaches were also included to quantify the added value of integrative methods. For most matrix factorization methods and one Bayesian approach (BCC), the shared and specific structures were successfully recovered with high and moderate accuracy, respectively. An opposite behavior was observed on non-integrative approaches, i.e. high performances on specific structures only. Finally, we applied the methods on the Cancer Genome Atlas breast cancer data set to check whether results based on experimental data were consistent with those obtained in the simulations.

Download Full-text

Challenges in the Integration of Omics and Non-Omics Data

Genes ◽

10.3390/genes10030238 ◽

2019 ◽

Vol 10 (3) ◽

pp. 238 ◽

Cited By ~ 22

Author(s):

Evangelina López de Maturana ◽

Lola Alonso ◽

Pablo Alarcón ◽

Isabel Adoración Martín-Antoniano ◽

Silvia Pineda ◽

...

Keyword(s):

Data Integration ◽

Large Scale ◽

Predictive Ability ◽

Epidemiological Data ◽

Joint Modeling ◽

Epidemiological Studies ◽

Omics Data ◽

Data Set ◽

Health Domains ◽

Analytical Strategies

Omics data integration is already a reality. However, few omics-based algorithms show enough predictive ability to be implemented into clinics or public health domains. Clinical/epidemiological data tend to explain most of the variation of health-related traits, and its joint modeling with omics data is crucial to increase the algorithm’s predictive ability. Only a small number of published studies performed a “real” integration of omics and non-omics (OnO) data, mainly to predict cancer outcomes. Challenges in OnO data integration regard the nature and heterogeneity of non-omics data, the possibility of integrating large-scale non-omics data with high-throughput omics data, the relationship between OnO data (i.e., ascertainment bias), the presence of interactions, the fairness of the models, and the presence of subphenotypes. These challenges demand the development and application of new analysis strategies to integrate OnO data. In this contribution we discuss different attempts of OnO data integration in clinical and epidemiological studies. Most of the reviewed papers considered only one type of omics data set, mainly RNA expression data. All selected papers incorporated non-omics data in a low-dimensionality fashion. The integrative strategies used in the identified papers adopted three modeling methods: Independent, conditional, and joint modeling. This review presents, discusses, and proposes integrative analytical strategies towards OnO data integration.

Download Full-text

Combining Genetic Mutation and Expression Profiles Identifies Novel Prognostic Biomarkers of Lung Adenocarcinoma

Clinical Medicine Insights Oncology ◽

10.1177/1179554920966260 ◽

2020 ◽

Vol 14 ◽

pp. 117955492096626

Author(s):

Yun Liu ◽

Fu Liu ◽

Xintong Hu ◽

Jiaxue He ◽

Yanfang Jiang

Keyword(s):

High Risk ◽

Lung Adenocarcinoma ◽

Expression Profiles ◽

Genetic Mutation ◽

High Risk Group ◽

The Cancer Genome Atlas ◽

Support Vector ◽

Omics Data ◽

Differential Analysis ◽

Data Set

Motivation: Although several prognostic signatures for lung adenocarcinoma (LUAD) have been developed, they are mainly based on a single-omics data set. This article aims to develop a novel set of prognostic signatures by combining genetic mutation and expression profiles of LUAD patients. Methods: The genetic mutation and expression profiles, together with the clinical profiles of a cohort of LUAD patients from The Cancer Genome Atlas (TCGA), were downloaded. Patients were separated into 2 groups, namely, the high-risk and low-risk groups, according to their overall survivals. Then, differential analysis was performed to determine differentially expressed genes (DEGs) and mutated genes (DMGs) in the expression and mutation profiles, respectively, between the 2 groups. Finally, a prognostic model based on the support vector machine (SVM) algorithm was developed by combining the expression values of the DEGs and the mutation times of the DMGs. Results: A total of 13 DEGs and 7 DMGs were recognized between the 2 groups. Their prognostic values were validated using independent cohorts. Compared with several existing signatures, the proposed prognostic signatures exhibited better prediction performance in the testing set. In addition, it is found that 1 of the 7 DMGs, GRIN2B, is mutated much more frequently in the high-risk group, showing a potential value as a therapy target. Conclusions: Combining multi-omics data sets is an applicable manner to identify novel prognostic signatures and to improve the prognostic prediction for LUAD, which will be heuristic to other types of cancers.

Download Full-text

A hierarchical clustering and data fusion approach for disease subtype discovery

10.1101/2020.01.16.909382 ◽

2020 ◽

Author(s):

Bastian Pfeifer ◽

Michael G. Schimek

Keyword(s):

Data Fusion ◽

Cancer Patients ◽

Hierarchical Clustering ◽

Cancer Progression ◽

The Cancer Genome Atlas ◽

Superior Performance ◽

Clustering Methods ◽

Cancer Data ◽

Disease Subtype ◽

Fusion Approach

AbstractRecent advances in multi-omics clustering methods enable a more fine-tuned separation of cancer patients into clinical relevant clusters. These advancements have the potential to provide a deeper understanding of cancer progression and may facilitate the treatment of cancer patients. Here, we present a simple hierarchical clustering and data fusion approach, named HC-fused, for the detection of disease subtypes. Unlike other methods, the proposed approach naturally reports on the individual contribution of each single-omic to the data fusion process. We perform multi-view simulations with disjoint and disjunct cluster elements across the views to highlight fundamentally different data integration behaviour of various state-of-the-art methods. HC-fused combines the strengths of some recently published methods and shows superior performance on real world cancer data from the TCGA (The Cancer Genome Atlas) database. An R implementation of our method is available on GitHub (pievos101/HC-fused).

Download Full-text

Multi-omics Data Integration by Generative Adversarial Network

10.1101/2021.03.13.435251 ◽

2021 ◽

Author(s):

Khandakar Tanvir Ahmed ◽

Jiao Sun ◽

Jeongsik Yong ◽

Wei Zhang

Keyword(s):

Large Scale ◽

Synthetic Data ◽

Interaction Network ◽

Vital Role ◽

The Cancer Genome Atlas ◽

Survival Prediction ◽

Omics Data ◽

Generative Adversarial Network ◽

Adversarial Network ◽

Cancer Outcome

Accurate disease phenotype prediction plays an important role in the treatment of heterogeneous diseases like cancer in the era of precision medicine. With the advent of high throughput technologies, more comprehensive multi-omics data is now available that can effectively link the genotype to phenotype. However, the interactive relation of multi-omics datasets makes it particularly challenging to incorporate different biological layers to discover the coherent biological signatures and predict phenotypic outcomes. In this study, we introduce omicsGAN, a generative adversarial network (GAN) model to integrate two omics data and their interaction network. The model captures information from the interaction network as well as the two omics datasets and fuse them to generate synthetic data with better predictive signals. Large-scale experiments on The Cancer Genome Atlas (TCGA) breast cancer and ovarian cancer datasets validate that (1) the model can effectively integrate two omics data (i.e., mRNA and microRNA expression data) and their interaction network (i.e., microRNA-mRNA interaction network). The synthetic omics data generated by the proposed model has a better performance on cancer outcome classification and patients survival prediction compared to original omics datasets. (2) The integrity of the interaction network plays a vital role in the generation of synthetic data with higher predictive quality. Using a random interaction network does not allow the framework to learn meaningful information from the omics datasets; therefore, results in synthetic data with weaker predictive signals.

Download Full-text

A learned embedding for efficient joint analysis of millions of mass spectra

10.1101/483263 ◽

2018 ◽

Cited By ~ 4

Author(s):

Damon H. May ◽

Jeffrey Bilmes ◽

William S. Noble

Keyword(s):

Mass Spectra ◽

Large Scale ◽

Dimensional Space ◽

Software Implementation ◽

Mass Spectrometry Data ◽

Joint Analysis ◽

Clustering Methods ◽

Peptide Mass ◽

Public Repositories ◽

Low Dimensional

AbstractDespite an explosion of data in public repositories, peptide mass spectra are usually analyzed by each laboratory in isolation, treating each experiment as if it has no relationship to any others. This approach fails to exploit the wealth of existing, previously analyzed mass spectrometry data. Others have jointly analyzed many mass spectra, often using clustering. However, mass spectra are not necessarily best summarized as clusters, and although new spectra can be added to existing clusters, clustering methods previously applied to mass spectra do not allow new clusters to be defined without completely re-clustering. As an alternative, we propose to train a deep neural network, called “GLEAMS,” to learn an embedding of spectra into a low-dimensional space in which spectra generated by the same peptide are close to one another. We demonstrate empirically the utility of this learned embedding by propagating annotations from labeled to unlabeled spectra. We further use GLEAMS to detect groups of unidentified, proximal spectra representing the same peptide, and we show how to use these spectral communities to reveal misidentified spectra and to characterize frequently observed but consistently unidentified molecular species. We provide a software implementation of our approach, along with a tool to quickly embed additional spectra using a pre-trained model, to facilitate large-scale analyses.

Download Full-text

Active vitamin D induces gene-specific hypomethylation in prostate cancer cells developing vitamin D resistance

AJP Cell Physiology ◽

10.1152/ajpcell.00522.2019 ◽

2020 ◽

Vol 318 (5) ◽

pp. C836-C847 ◽

Cited By ~ 2

Author(s):

Guan-Rong Lai ◽

Yi-Fen Lee ◽

Shian-Jang Yan ◽

Huei-Ju Ting

Keyword(s):

Prostate Cancer ◽

Vitamin D ◽

Cancer Progression ◽

Transcriptional Activation ◽

Cell Model ◽

In Silico Analysis ◽

Dna Methyltransferases ◽

The Cancer Genome Atlas ◽

Data Set ◽

Cancer Data

Prostate cancer (PCa) is a leading cause of cancer death in men. Despite the antiproliferative effects of 1α,25-dihydroxyvitamin D3 [1,25(OH)2D3] on PCa, accumulating evidence indicates that 1,25(OH)2D3 promotes cancer progression by increasing genome plasticity. Our investigation of epigenetic changes associated with vitamin D insensitivity found that 1,25(OH)2D3 treatment reduced the expression levels and activities of DNA methyltransferases 1 and 3B (DNMT1 and DNMT3B, respectively). In silico analysis and reporter assay confirmed that 1,25(OH)2D3 downregulated transcriptional activation of the DNMT3B promoter and upregulated microRNAs targeting the 3′-untranslated regions of DNMT3B. We then profiled DNA methylation in the vitamin D-resistant PC-3 cells and a resistant PCa cell model generated by long-term 1,25(OH)2D3 exposure. Several candidate genes were found to be hypomethylated and overexpressed in vitamin D-resistant PCa cells compared with vitamin D-sensitive cells. Most of the identified genes were associated with mammalian target of rapamycin (mTOR) signaling activation, which is known to promote cancer progression. Among them, we found that inhibition of ribosomal protein S6 kinase A1 (RPS6KA1) promoted vitamin D sensitivity in PC-3 cells. Furthermore, The Cancer Genome Atlas (TCGA) prostate cancer data set demonstrated that midline 1 ( MID1) expression is positively correlated with tumor stage. Overall, our study reveals an inhibitory mechanism of 1,25(OH)2D3 on DNMT3B, which may contribute to vitamin D resistance in PCa.

Download Full-text

Centrally concentrated molecular gas driving galactic-scale ionized gas outflows in star-forming galaxies

Monthly Notices of the Royal Astronomical Society ◽

10.1093/mnras/staa3512 ◽

2020 ◽

Vol 500 (3) ◽

pp. 3802-3820

Author(s):

L M Hogarth ◽

A Saintonge ◽

L Cortese ◽

T A Davis ◽

S M Croom ◽

...

Keyword(s):

Star Formation ◽

Spatial Resolution ◽

Large Scale ◽

Formation Rate ◽

Control Sample ◽

Joint Analysis ◽

Molecular Gas ◽

Ionized Gas ◽

Data Set ◽

Star Forming

ABSTRACT We perform a joint analysis of high spatial resolution molecular gas and star-formation rate (SFR) maps in main-sequence star-forming galaxies experiencing galactic-scale outflows of ionized gas. Our aim is to understand the mechanism that determines which galaxies are able to launch these intense winds. We observed CO(1→0) at 1-arcsec resolution with ALMA in 16 edge-on galaxies, which also have 2-arcsec spatial-resolution optical integral field observations from the SAMI Galaxy Survey. Half the galaxies in the sample were previously identified as harbouring intense and large-scale outflows of ionized gas (‘outflow types’) and the rest serve as control galaxies. The data set is complemented by integrated CO(1→0) observations from the IRAM 30-m telescope to probe the total molecular gas reservoirs. We find that the galaxies powering outflows do not possess significantly different global gas fractions or star-formation efficiencies when compared with a control sample. However, the ALMA maps reveal that the molecular gas in the outflow-type galaxies is distributed more centrally than in the control galaxies. For our outflow-type objects, molecular gas and star-formation are largely confined within their inner effective radius (reff), whereas in the control sample, the distribution is more diffuse, extending far beyond reff. We infer that outflows in normal star-forming galaxies may be caused by dynamical mechanisms that drive molecular gas into their central regions, which can result in locally enhanced gas surface density and star-formation.

Download Full-text

A novel network controllability algorithm to target personalized driver genes for discovering combinational drugs of individual cancer patient

10.1101/571620 ◽

2019 ◽

Author(s):

Wei-Feng Guo ◽

Shao-Wu Zhang ◽

Tao Zeng ◽

Luonan Chen

Keyword(s):

Cancer Patient ◽

Side Effect ◽

Target Genes ◽

Driver Gene ◽

Omics Data ◽

Driver Genes ◽

Data Set ◽

Cancer Data ◽

Personalized Risk ◽

Network Controllability

AbstractTreating cancer in precision medicine, it is important to identify the personalized combinational drugs under consideration of the individual heterogeneity. Many bioinformatics tools for the personalized driver genes identification have presented promising clues in determining candidate personalized drug targets for the personalized drugs discovery. However, it has not been studied how to fill the gap between personalized driver genes identification and personalized combinational drugs discovery. In this work, we developed a novel algorithm of structure network Controllability-based Personalized driver Genes and combinational Drug identification (CPGD), aiming to mine the personalized driver genes and identify the combinational drugs of an individual cancer patient. On two benchmark cancer datasets, the performance of CPGD for predicting the clinical efficacious combinational drugs is superior to that of other state-of-the-art driver gene-focus algorithms in terms of precision accuracy. In particular, by quantifying and referring the relationships between target genes of pairwise combinatorial drugs and disease module genes on breast cancer data set, CPGD can significantly divide patients into the discriminative high-risk and low-risk groups for risk asessment in combination therapy. In addition, CPGD can further enhance cancer subtyping by providing computationally personalized side effect signatures for individual patients. Collectively, CPGD provided a new and effecient bioinformatics tool from structure network controllability perspective for discovering personalized combinational drugs with personalized side effect consideration, so as to effectively support personalized risk assessement and disease subtyping.SignificanceIt is quite challenging to predict personalized combinational drugs rather than patient-cohort‘s drugs based on cancer omics data. In this work, a novel structure network Controllability-based algorithm (CPGD) from feedback vertex sets control perspective was developed, for discovering efficacious combinational drugs of an individual cancer patient by targeting the personalized driver genes. The CPGD contains three methodological advances by exploring more precise mathematical models on high-throughput personalized multi-omics data. The first is that a proper network structure is constructed to characterize the gene regulatory mechanism of an individual patient. The second is that considering the weight information of network edges/relations improves the performance for predicting clinical efficacious combinational drugs compared with other drivers-focus methods. And the third is that proper evaluation metrics for personalized combinational drugs prioritization, personalized risk assessment and disease subtyping are designed when evaluating the performance of CPGD.

Download Full-text

Tumor microenvironment evaluation promotes precise checkpoint immunotherapy of advanced gastric cancer

Journal for ImmunoTherapy of Cancer ◽

10.1136/jitc-2021-002467 ◽

2021 ◽

Vol 9 (8) ◽

pp. e002467

Author(s):

Dongqiang Zeng ◽

Jiani Wu ◽

Huiyan Luo ◽

Yong Li ◽

Jian Xiao ◽

...

Keyword(s):

Gastric Cancer ◽

Tumor Microenvironment ◽

Advanced Gastric Cancer ◽

Predictive Value ◽

Immune Checkpoint Blockade ◽

The Cancer Genome Atlas ◽

Omics Data ◽

Predictive Capacity ◽

Cancer Data ◽

Number Of Patients

BackgroundDurable efficacy of immune checkpoint blockade (ICB) occurred in a small number of patients with metastatic gastric cancer (mGC) and the determinant biomarker of response to ICB remains unclear.MethodsWe developed an open-source TMEscore R package, to quantify the tumor microenvironment (TME) to aid in addressing this dilemma. Two advanced gastric cancer cohorts (RNAseq, N=45 and NanoString, N=48) and other advanced cancer (N=534) treated with ICB were leveraged to investigate the predictive value of TMEscore. Simultaneously, multi-omics data from The Cancer Genome Atlas of Stomach Adenocarcinoma (TCGA-STAD) and Asian Cancer Research Group (ACRG) were interrogated for underlying mechanisms.ResultsThe predictive capacity of TMEscore was corroborated in patient with mGC cohorts treated with pembrolizumab in a prospective phase 2 clinical trial (NCT02589496, N=45, area under the curve (AUC)=0.891). Notably, TMEscore, which has a larger AUC than programmed death-ligand 1 combined positive score, tumor mutation burden, microsatellite instability, and Epstein-Barr virus, was also validated in the multicenter advanced gastric cancer cohort using NanoString technology (N=48, AUC=0.877). Exploration of the intrinsic mechanisms of TMEscore with TCGA and ACRG multi-omics data identified TME pertinent mechanisms including mutations, metabolism pathways, and epigenetic features.ConclusionsCurrent study highlighted the promising predictive value of TMEscore for patients with mGC. Exploration of TME in multi-omics gastric cancer data may provide the impetus for precision immunotherapy.

Download Full-text

Identification of Deregulated Transcription Factors Involved in Specific Bladder Cancer Subtypes

10.29007/v7qj ◽

2020 ◽

Author(s):

Magali Champion ◽

Julien Chiquet ◽

Pierre Neuvial ◽

Mohamed Elati ◽

François Radvanyi ◽

...

Keyword(s):

Gene Expression ◽

Bladder Cancer ◽

Transcription Factor ◽

Transcription Factors ◽

Target Genes ◽

The Cancer Genome Atlas ◽

Reference Network ◽

Data Set ◽

Cancer Subtypes ◽

Cancer Data

Comparison between tumoral and healthy cells may reveal abnormal regulation behaviors between a transcription factor and the genes it regulates, without exhibiting differential expression of the former genes. We propose a methodology for the identification of transcription factors involved in the deregulation of genes in tumoral cells. This strategy is based on the inference of a reference gene regulatory network that connects transcription factors to their downstream targets using gene expression data. Gene expression levels in tumor samples are then carefully compared to this reference network to detect deregulated target genes. A linear model is finally used to measure the ability of each transcription factor to explain these deregulations. We assess the performance of our method by numerical experiments on a public bladder cancer data set derived from the Cancer Genome Atlas project. We identify genes known for their implication in the development of specific bladder cancer subtypes as well as new potential biomarkers.

Download Full-text