Survey and comparative assessments of computational multi-omics integrative methods with multiple regulatory networks identifying distinct tumor compositions across pan-cancer data sets

Briefings in Bioinformatics ◽

10.1093/bib/bbaa102 ◽

2020 ◽

Cited By ~ 1

Author(s):

Zhuohui Wei ◽

Yue Zhang ◽

Wanlin Weng ◽

Jiazhou Chen ◽

Hongmin Cai

Keyword(s):

Molecular Mechanisms ◽

Genomic Data ◽

Low Rank ◽

Integrated Analysis ◽

Data Sets ◽

Omics Data ◽

Data Types ◽

Cancer Data ◽

Cancer Types ◽

Pan Cancer

Abstract The significance of pan-cancer categories has recently been recognized as widespread in cancer research. Pan-cancer categorizes a cancer based on its molecular pathology rather than an organ. The molecular similarities among multi-omics data found in different cancer types can play several roles in both biological processes and therapeutic developments. Therefore, an integrated analysis for various genomic data is frequently used to reveal novel genetic and molecular mechanisms. However, a variety of algorithms for multi-omics clustering have been proposed in different fields. The comparison of different computational clustering methods in pan-cancer analysis performance remains unclear. To increase the utilization of current integrative methods in pan-cancer analysis, we first provide an overview of five popular computational integrative tools: similarity network fusion, integrative clustering of multiple genomic data types (iCluster), cancer integration via multi-kernel learning (CIMLR), perturbation clustering for data integration and disease subtyping (PINS) and low-rank clustering (LRACluster). Then, a priori interactions in multi-omics data were incorporated to detect prominent molecular patterns in pan-cancer data sets. Finally, we present comparative assessments of these methods, with discussion over key issues in applying these algorithms. We found that all five methods can identify distinct tumor compositions. The pan-cancer samples can be reclassified into several groups by different proportions. Interestingly, each method can classify the tumors into categories that are different from original cancer types or subtypes, especially for ovarian serous cystadenocarcinoma (OV) and breast invasive carcinoma (BRCA) tumors. In addition, all clusters of the five computational methods show notable prognostic values. Furthermore, both the 9 recurrent differential genes and the 15 common pathway characteristics were identified across all the methods. The results and discussion can help the community select appropriate integrative tools according to different research tasks or aims in pan-cancer analysis.

Download Full-text

Multiomic Integration of Public Oncology Databases in Bioconductor

JCO Clinical Cancer Informatics ◽

10.1200/cci.19.00119 ◽

2020 ◽

pp. 958-971

Author(s):

Marcel Ramos ◽

Ludwig Geistlinger ◽

Sehyun Oh ◽

Lucas Schiffer ◽

Rimsha Azhar ◽

...

Keyword(s):

Web Application ◽

Cancer Genomics ◽

Application Programming Interface ◽

Data Representation ◽

The Cancer Genome Atlas ◽

Data Sets ◽

Data Types ◽

Data Infrastructure ◽

Integrative Framework ◽

Pan Cancer

PURPOSE Investigations of the molecular basis for the development, progression, and treatment of cancer increasingly use complementary genomic assays to gather multiomic data, but management and analysis of such data remain complex. The cBioPortal for cancer genomics currently provides multiomic data from > 260 public studies, including The Cancer Genome Atlas (TCGA) data sets, but integration of different data types remains challenging and error prone for computational methods and tools using these resources. Recent advances in data infrastructure within the Bioconductor project enable a novel and powerful approach to creating fully integrated representations of these multiomic, pan-cancer databases. METHODS We provide a set of R/Bioconductor packages for working with TCGA legacy data and cBioPortal data, with special considerations for loading time; efficient representations in and out of memory; analysis platform; and an integrative framework, such as MultiAssayExperiment. Large methylation data sets are provided through out-of-memory data representation to provide responsive loading times and analysis capabilities on machines with limited memory. RESULTS We developed the curatedTCGAData and cBioPortalData R/Bioconductor packages to provide integrated multiomic data sets from the TCGA legacy database and the cBioPortal web application programming interface using the MultiAssayExperiment data structure. This suite of tools provides coordination of diverse experimental assays with clinicopathological data with minimal data management burden, as demonstrated through several greatly simplified multiomic and pan-cancer analyses. CONCLUSION These integrated representations enable analysts and tool developers to apply general statistical and plotting methods to extensive multiomic data through user-friendly commands and documented examples.

Download Full-text

Identification of pan-cancer Ras pathway activation with deep learning

Briefings in Bioinformatics ◽

10.1093/bib/bbaa258 ◽

2020 ◽

Author(s):

Xiangtao Li ◽

Shaochuan Li ◽

Yunhe Wang ◽

Shixiong Zhang ◽

Ka-Chun Wong

Keyword(s):

Deep Learning ◽

Superior Performance ◽

Recent Attempt ◽

Precision Oncology ◽

Pathway Activity ◽

Ras Pathway ◽

Cancer Data ◽

Pathway Activation ◽

Cancer Types ◽

Pan Cancer

Abstract The identification of hidden responders is often an essential challenge in precision oncology. A recent attempt based on machine learning has been proposed for classifying aberrant pathway activity from multiomic cancer data. However, we note several critical limitations there, such as high-dimensionality, data sparsity and model performance. Given the central importance and broad impact of precision oncology, we propose nature-inspired deep Ras activation pan-cancer (NatDRAP), a deep neural network (DNN) model, to address those restrictions for the identification of hidden responders. In this study, we develop the nature-inspired deep learning model that integrates bulk RNA sequencing, copy number and mutation data from PanCanAltas to detect pan-cancer Ras pathway activation. In NatDRAP, we propose to synergize the nature-inspired artificial bee colony algorithm with different gradient-based optimizers in one framework for optimizing DNNs in a collaborative manner. Multiple experiments were conducted on 33 different cancer types across PanCanAtlas. The experimental results demonstrate that the proposed NatDRAP can provide superior performance over other benchmark methods with strong robustness towards diagnosing RAS aberrant pathway activity across different cancer types. In addition, gene ontology enrichment and pathological analysis are conducted to reveal novel insights into the RAS aberrant pathway activity identification and characterization. NatDRAP is written in Python and available at https://github.com/lixt314/NatDRAP1.

Download Full-text

Large-Scale Analysis of Genetic and Clinical Patient Data

Annual Review of Biomedical Data Science ◽

10.1146/annurev-biodatasci-080917-013508 ◽

2018 ◽

Vol 1 (1) ◽

pp. 263-274 ◽

Cited By ~ 6

Author(s):

Marylyn D. Ritchie

Keyword(s):

Clinical Data ◽

Large Scale ◽

Data Science ◽

Genomic Analysis ◽

Genomic Data ◽

Data Sets ◽

Biomedical Data ◽

Data Types ◽

Phenotypic Data ◽

Clinical Patient

Biomedical data science has experienced an explosion of new data over the past decade. Abundant genetic and genomic data are increasingly available in large, diverse data sets due to the maturation of modern molecular technologies. Along with these molecular data, dense, rich phenotypic data are also available on comprehensive clinical data sets from health care provider organizations, clinical trials, population health registries, and epidemiologic studies. The methods and approaches for interrogating these large genetic/genomic and clinical data sets continue to evolve rapidly, as our understanding of the questions and challenges continue to emerge. In this review, the state-of-the-art methodologies for genetic/genomic analysis along with complex phenomics will be discussed. This field is changing and adapting to the novel data types made available, as well as technological advances in computation and machine learning. Thus, I will also discuss the future challenges in this exciting and innovative space. The promises of precision medicine rely heavily on the ability to marry complex genetic/genomic data with clinical phenotypes in meaningful ways.

Download Full-text

Multi-omic tumor data reveal diversity of molecular mechanisms that correlate with survival

10.1101/267245 ◽

2018 ◽

Cited By ~ 2

Author(s):

Daniele Ramazzotti ◽

Avantika Lal ◽

Bo Wang ◽

Serafim Batzoglou ◽

Arend Sidow

Keyword(s):

Molecular Mechanisms ◽

Molecular Subtypes ◽

Point Mutations ◽

Integrated Analysis ◽

Tumor Type ◽

Cancer Subtypes ◽

Cancer Types ◽

Copy Number Changes ◽

Omic Data

Outcomes for cancer patients vary greatly even within the same tumor type, and characterization of molecular subtypes of cancer holds important promise for improving prognosis and personalized treatment. This promise has motivated recent efforts to produce large amounts of multidimensional genomic (‘multi-omic’) data, but current algorithms still face challenges in the integrated analysis of such data. Here we present Cancer Integration via Multikernel Learning (CIMLR), a new cancer subtyping method that integrates multi-omic data to reveal molecular subtypes of cancer. We apply CIMLR to multi-omic data from 36 cancer types and show significant improvements in both computational efficiency and ability to extract biologically meaningful cancer subtypes. The discovered subtypes exhibit significant differences in patient survival for 27 of 36 cancer types. Our analysis reveals integrated patterns of gene expression, methylation, point mutations and copy number changes in multiple cancers and highlights patterns specifically associated with poor patient outcomes.

Download Full-text

Interpretable per Case Weighted Ensemble Method for Cancer Associations

10.1101/008185 ◽

2014 ◽

Author(s):

Adrin Jalali ◽

Nico Pfeifer

Keyword(s):

Gene Expression ◽

Dna Methylation ◽

Ribosomal Proteins ◽

Data Sets ◽

Data Set ◽

Cancer Data ◽

Combination Of Classifiers ◽

Cancer Types ◽

Or Gene ◽

Leukemia Data

Motivation: Molecular measurements from cancer patients such as gene expression and DNA methylation are usually very noisy. Furthermore, cancer types can be very heterogeneous. Therefore, one of the main assumptions for machine learning, that the underlying unknown distribution is the same for all samples, might not be completely fullfilled. We introduce a method, that can estimate this bias on a per-feature level and incorporate calculated feature confidences into a weighted combination of classifiers with disjoint feature sets. Results: The new method achieves state-of-the-art performance on many different cancer data sets with measured DNA methylation or gene expression. Moreover, we show how to visualize the learned classifiers to find interesting associations with the target label. Applied to a leukemia data set we find several ribosomal proteins associated with leukemia's risk group that might be interesting targets for follow-up studies and support the hypothesis that the ribosomes are a new frontier in gene regulation. Availability: The method is available under GPLv3+ License at https: //github.com/adrinjalali/Network-Classifier.

Download Full-text

The Cancer Genomic Atlas – “TO CONQUER CANCER”

International Journal of Molecular and Immuno Oncology ◽

10.25259/ijmio_28_2020 ◽

2020 ◽

Vol 0 ◽

pp. 1-6

Author(s):

Sai Sri Kavya Kadali ◽

Rachna Gowlikar ◽

Syeda Nooreen Fatima

Keyword(s):

Genetic Basis ◽

Data Repository ◽

Rare Cancer ◽

Cancer Data ◽

Data Collection Process ◽

Genomics And Proteomics ◽

Prevention Studies ◽

Cancer Types ◽

Pan Cancer

The Cancer Genomic Atlas (TCGA) is a publicly accessible cancer data repository and tool that allows us to understand the molecular basis of cancer through the application of genomics and proteomics. So far, researchers have been able to diagnose 33 cancer types including 10 rare cancer types. The key features of TCGA are to make the data collection process publicly accessible for the better understanding of the molecular and genetic basis of cancer and its mechanism of action along with its prevention. Studies on different cancer types along with comprehensive pan cancer analysis have expanded the understanding and purpose of TCGA. Ever since its’ conceptualization, its’ high-throughput approach has provided a platform for the identification of genes and pathways involved in cancers and accurate classification of cancers.

Download Full-text

Integrated Analysis Reveals hsa-miR-142 as a Representative of a Lymphocyte-Specific Gene Expression and Methylation Signature

Cancer Informatics ◽

10.4137/cin.s9037 ◽

2012 ◽

Vol 11 ◽

pp. CIN.S9037 ◽

Cited By ~ 14

Author(s):

Bill Andreopoulos ◽

Dimitris Anastassiou

Keyword(s):

Gene Expression ◽

Cell Adhesion ◽

The Cancer Genome Atlas ◽

Integrated Analysis ◽

Specific Gene ◽

Data Types ◽

Multiple Cancer ◽

Specific Expression ◽

Cancer Types ◽

Cell Cell

Gene expression profiling has provided insights into different cancer types and revealed tissue-specific expression signatures. Alterations in microRNA expression contribute to the pathogenesis of many types of human diseases. Few studies have integrated all levels of gene expression, miRNA and methylation to uncover correlations between these data types. We performed an integrated profiling to discover instances of miRNAs associated with a gene expression and DNA methylation signature across multiple cancer types. Using data from The Cancer Genome Atlas (TCGA), we revealed a concordant gene expression and methylation signature associated with the microRNA hsa-miR-142 across the same samples. In all cancer types examined, we found a signature of co-expression of a gene set R and methylated sites M, which correlate positively (M+) or negatively (M–) with the expression of hsa-miR-142. The set R consistently contains many genes, such as TRAF3IP3, NCKAP1L, CD53, LAPTM5, PTPRC, EVI2B, DOCK2, LCP2, CYBB and FYB. The signature is preserved across glioblastoma, ovarian, breast, colon, kidney, lung, uterine and rectum cancer. There is 28% overlap of methylation sites in M between glioblastoma (GBM) and ovarian cancer. There is 60% overlap of genes in R between GBM and ovarian ( P = 1.3e−-11). Most of the genes in R are known to be expressed in lymphocytes and haematopoietic stem cells, while M reflects membrane proteins involved in cell-cell adhesion functions. We speculate that the hsa-miR-142 associated signature may signal haematopoietic-specific processes and an accumulation of methylation events triggering a progressive loss of cell-cell adhesion. We also observed that GBM samples belonging to the proneural subtype tend to have underexpressed hsa-miR-142 and R genes, hypomethylated M+ and hypermethylated M–, while the mesenchymal samples have the opposite profile.

Download Full-text

A generic multivariate framework for the integration of microbiome longitudinal studies with other data types

10.1101/585802 ◽

2019 ◽

Cited By ~ 2

Author(s):

Antoine Bodein ◽

Olivier Chapleur ◽

Arnaud Droit ◽

Kim-Anh Lê Cao

Keyword(s):

Microbial Communities ◽

Time Course ◽

Molecular Mechanisms ◽

Smoothing Splines ◽

Individual Variability ◽

Integrated Analysis ◽

Data Types ◽

Phenotypic Data ◽

Different Types ◽

Derived Data

AbstractSimultaneous profiling of biospecimens using different technological platforms enables the study of many data types, encompassing microbial communities, omics and meta-omics as well as clinical or chemistry variables. Reduction in costs now enables longitudinal or time course studies on the same biological material or system. The overall aim of such studies is to investigate relationships between these longitudinal measures in a holistic manner to further decipher the link between molecular mechanisms and microbial community structures, or host-microbiota interactions. However, analytical frameworks enabling an integrated analysis between microbial communities and other types of biological, clinical or phenotypic data are still in their infancy. The challenges include few time points that may be unevenly spaced and unmatched between different data types, a small number of unique individual biospecimens and high individual variability. Those challenges are further exacerbated by the inherent characteristics of microbial communities-derived data (e.g. sparsity, compositional).We propose a generic data-driven framework to integrate different types of longitudinal data measured on the same biological specimens with microbial communities data, and select key temporal features with strong associations within the same sample group. The framework ranges from filtering and modelling, to integration using smoothing splines and multivariate dimension reduction methods to address some of the analytical challenges of microbiome-derived data. We illustrate our framework on different types of multi-omics case studies in bioreactor experiments as well as human studies.

Download Full-text

Glutamine Metabolism Regulators Associated with Cancer Development and the Tumor Microenvironment: A Pan-Cancer Multi-Omics Analysis

Genes ◽

10.3390/genes12091305 ◽

2021 ◽

Vol 12 (9) ◽

pp. 1305

Author(s):

Jingwen Zou ◽

Kunpeng Du ◽

Shaohua Li ◽

Lianghe Lu ◽

Jie Mei ◽

...

Keyword(s):

Gene Expression ◽

Tumor Microenvironment ◽

Drug Sensitivity ◽

Molecular Mechanisms ◽

Immune Cell ◽

Metabolic Reprogramming ◽

Glutamine Metabolism ◽

Molecular Networks ◽

Cancer Types ◽

Pan Cancer

Background: In recent years, metabolic reprogramming has been identified as a hallmark of cancer. Accumulating evidence suggests that glutamine metabolism plays a crucial role in oncogenesis and the tumor microenvironment. In this study, we aimed to perform a systematic and comprehensive analysis of six key metabolic node genes involved in the dynamic regulation of glutamine metabolism (referred to as GLNM regulators) across 33 types of cancer. Methods: We analyzed the gene expression, epigenetic regulation, and genomic alterations of six key GLNM regulators, including SLC1A5, SLC7A5, SLC3A2, SLC7A11, GLS, and GLS2, in pan-cancer using several open-source platforms and databases. Additionally, we investigated the impacts of these gene expression changes on clinical outcomes, drug sensitivity, and the tumor microenvironment. We also attempted to investigate the upstream microRNA–mRNA molecular networks and the downstream signaling pathways involved in order to uncover the potential molecular mechanisms behind metabolic reprogramming. Results: We found that the expression levels of GLNM regulators varied across cancer types and were related to several genomic and immunological characteristics. While the immune scores were generally lower in the tumors with higher gene expression, the types of immune cell infiltration showed significantly different correlations among cancer types, dividing them into two clusters. Furthermore, we showed that elevated GLNM regulators expression was associated with poor overall survival in the majority of cancer types. Lastly, the expression of GLNM regulators was significantly associated with PD-L1 expression and drug sensitivity. Conclusions: The elevated expression of GLNM regulators was associated with poorer cancer prognoses and a cold tumor microenvironment, providing novel insights into cancer treatment and possibly offering alternative options for the treatment of clinically refractory cancers.

Download Full-text

A comprehensive database for integrated analysis of omics data in autoimmune diseases

BMC Bioinformatics ◽

10.1186/s12859-021-04268-4 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Jordi Martorell-Marugán ◽

Raúl López-Domínguez ◽

Adrián García-Moreno ◽

Daniel Toro-Domínguez ◽

Juan Antonio Villatoro-García ◽

...

Keyword(s):

Data Analysis ◽

Autoimmune Diseases ◽

Pathway Analysis ◽

Molecular Mechanisms ◽

Meta Analysis ◽

Integrated Analysis ◽

Omics Data ◽

Difficult Diagnosis ◽

Integrative Analyses ◽

Public Repositories

Abstract Background Autoimmune diseases are heterogeneous pathologies with difficult diagnosis and few therapeutic options. In the last decade, several omics studies have provided significant insights into the molecular mechanisms of these diseases. Nevertheless, data from different cohorts and pathologies are stored independently in public repositories and a unified resource is imperative to assist researchers in this field. Results Here, we present Autoimmune Diseases Explorer (https://adex.genyo.es), a database that integrates 82 curated transcriptomics and methylation studies covering 5609 samples for some of the most common autoimmune diseases. The database provides, in an easy-to-use environment, advanced data analysis and statistical methods for exploring omics datasets, including meta-analysis, differential expression or pathway analysis. Conclusions This is the first omics database focused on autoimmune diseases. This resource incorporates homogeneously processed data to facilitate integrative analyses among studies.

Download Full-text