Software for the integration of multi-omics experiments in Bioconductor

Mapping Intimacies ◽

10.1101/144774 ◽

2017 ◽

Cited By ~ 2

Author(s):

Marcel Ramos ◽

Lucas Schiffer ◽

Angela Re ◽

Rimsha Azhar ◽

Azfar Basunia ◽

...

Keyword(s):

Statistical Analysis ◽

Data Science ◽

Data Representation ◽

The Cancer Genome Atlas ◽

Cancer Tissue ◽

Omics Data ◽

Data Types ◽

Design Data ◽

High Throughput Data ◽

Cancer Genome Atlas

ABSTRACTMulti-omics experiments are increasingly commonplace in biomedical research, and add layers of complexity to experimental design, data integration, and analysis. R and Bioconductor provide a generic framework for statistical analysis and visualization, as well as specialized data classes for a variety of high-throughput data types, but methods are lacking for integrative analysis of multi-omics experiments. The MultiAssayExperiment software package, implemented in R and leveraging Bioconductor software and design principles, provides for the coordinated representation of, storage of, and operation on multiple diverse genomics data. We provide all of the multiple ‘omics data for each cancer tissue in The Cancer Genome Atlas (TCGA) as ready-to-analyze MultiAssayExperiment objects, and demonstrate in these and other datasets how the software simplifies data representation, statistical analysis, and visualization. The MultiAssayExperiment Bioconductor package reduces major obstacles to efficient, scalable and reproducible statistical analysis of multi-omics data and enhances data science applications of multiple omics datasets.

Download Full-text

Deep Learning for Automatic Subclassification of Gastric Carcinoma Using Whole-Slide Histopathology Images

Cancers ◽

10.3390/cancers13153811 ◽

2021 ◽

Vol 13 (15) ◽

pp. 3811

Author(s):

Hyun-Jong Jang ◽

In-Hye Song ◽

Sung-Hak Lee

Keyword(s):

Deep Learning ◽

Relative Proportion ◽

The Cancer Genome Atlas ◽

Cancer Tissue ◽

Mucinous Tumor ◽

Receiver Operating Characteristic Curves ◽

Tumor Tissues ◽

Cancer Genome Atlas ◽

Stomach Adenocarcinoma ◽

Tumor Types

Histomorphologic types of gastric cancer (GC) have significant prognostic values that should be considered during treatment planning. Because the thorough quantitative review of a tissue slide is a laborious task for pathologists, deep learning (DL) can be a useful tool to support pathologic workflow. In the present study, a fully automated approach was applied to distinguish differentiated/undifferentiated and non-mucinous/mucinous tumor types in GC tissue whole-slide images from The Cancer Genome Atlas (TCGA) stomach adenocarcinoma dataset (TCGA-STAD). By classifying small patches of tissue images into differentiated/undifferentiated and non-mucinous/mucinous tumor tissues, the relative proportion of GC tissue subtypes can be easily quantified. Furthermore, the distribution of different tissue subtypes can be clearly visualized. The patch-level areas under the curves for the receiver operating characteristic curves for the differentiated/undifferentiated and non-mucinous/mucinous classifiers were 0.932 and 0.979, respectively. We also validated the classifiers on our own GC datasets and confirmed that the generalizability of the classifiers is excellent. The results indicate that the DL-based tissue classifier could be a useful tool for the quantitative analysis of cancer tissue slides. By combining DL-based classifiers for various molecular and morphologic variations in tissue slides, the heterogeneity of tumor tissues can be unveiled more efficiently.

Download Full-text

Pan-cancer analysis of transcripts encoding novel open-reading frames (nORFs) and their potential biological functions

npj Genomic Medicine ◽

10.1038/s41525-020-00167-4 ◽

2021 ◽

Vol 6 (1) ◽

Cited By ~ 2

Author(s):

Chaitanya Erady ◽

Adam Boxall ◽

Shraddha Puntambekar ◽

N. Suhas Jagannathan ◽

Ruchi Chauhan ◽

...

Keyword(s):

Transcript Level ◽

Open Reading Frames ◽

The Cancer Genome Atlas ◽

Small Subset ◽

Cancer Tissue ◽

Post Translational Modifications ◽

Cancer Genome Atlas ◽

Systematic Identification ◽

Reading Frames

AbstractUncharacterized and unannotated open-reading frames, which we refer to as novel open reading frames (nORFs), may sometimes encode peptides that remain unexplored for novel therapeutic opportunities. To our knowledge, no systematic identification and characterization of transcripts encoding nORFs or their translation products in cancer, or in any other physiological process has been performed. We use our curated nORFs database (nORFs.org), together with RNA-Seq data from The Cancer Genome Atlas (TCGA) and Genotype-Expression (GTEx) consortiums, to identify transcripts containing nORFs that are expressed frequently in cancer or matched normal tissue across 22 cancer types. We show nORFs are subject to extensive dysregulation at the transcript level in cancer tissue and that a small subset of nORFs are associated with overall patient survival, suggesting that nORFs may have prognostic value. We also show that nORF products can form protein-like structures with post-translational modifications. Finally, we perform in silico screening for inhibitors against nORF-encoded proteins that are disrupted in stomach and esophageal cancer, showing that they can potentially be targeted by inhibitors. We hope this work will guide and motivate future studies that perform in-depth characterization of nORF functions in cancer and other diseases.

Download Full-text

An R Package for Divergence Analysis of Omics Data

10.1101/720391 ◽

2019 ◽

Author(s):

Wikum Dinalankara ◽

Qian Ke ◽

Donald Geman ◽

Luigi Marchionni

Keyword(s):

High Throughput Sequencing ◽

R Package ◽

The Cancer Genome Atlas ◽

High Dimensional ◽

Omics Data ◽

Sequencing Data ◽

High Throughput Sequencing Data ◽

Ternary Code ◽

Cancer Genome Atlas ◽

Level Analysis

AbstractGiven the ever-increasing amount of high-dimensional and complex omics data becoming available, it is increasingly important to discover simple but effective methods of analysis. Divergence analysis transforms each entry of a high-dimensional omics profile into a digitized (binary or ternary) code based on the deviation of the entry from a given baseline population. This is a novel framework that is significantly different from existing omics data analysis methods: it allows digitization of continuous omics data at the univariate or multivariate level, facilitates sample level analysis, and is applicable on many different omics platforms. The divergence package, available on the R platform through the Bioconductor repository collection, provides easy-to-use functions for carrying out this transformation. Here we demonstrate how to use the package with sample high throughput sequencing data from the Cancer Genome Atlas.

Download Full-text

Novel cancer subtyping method based on patient-specific gene regulatory network

10.1101/2021.03.24.436731 ◽

2021 ◽

Author(s):

Mai Adachi Nakazawa ◽

Yoshinori Tamada ◽

Yoshihisa Tanaka ◽

Marie Ikeguchi ◽

Kako Higashihara ◽

...

Keyword(s):

Gene Networks ◽

Regulatory Networks ◽

The Cancer Genome Atlas ◽

Patient Specific ◽

Specific Gene ◽

Omics Data ◽

Cancer Subtypes ◽

Molecular Systems ◽

Molecular Features ◽

Cancer Genome Atlas

The identification of cancer subtypes is important for the understanding of tumor heterogeneity. In recent years, numerous computational methods have been proposed for this problem based on the multi-omics data of patients. It is widely accepted that different cancer subtypes are induced by different molecular regulatory networks. However, only a few incorporate the differences between their molecular systems into the classification processes. In this study, we present a novel method to classify cancer subtypes based on patient-specific molecular systems. Our method quantifies patient-specific gene networks, which are estimated from their transcriptome data. By clustering their quantified networks, our method allows for cancer subtyping, taking into consideration the differences in the molecular systems of patients. Comprehensive analyses of The Cancer Genome Atlas (TCGA) datasets applied to our method confirmed that they were able to identify more clinically meaningful cancer subtypes than the existing subtypes and found that the identified subtypes comprised different molecular features. Our findings show that the proposed method, based on a simple classification using the patient-specific molecular systems, can identify cancer subtypes even with single omics data, which cannot otherwise be captured by existing methods using multi-omics data.

Download Full-text

An R package for divergence analysis of omics data

PLoS ONE ◽

10.1371/journal.pone.0249002 ◽

2021 ◽

Vol 16 (4) ◽

pp. e0249002

Author(s):

Wikum Dinalankara ◽

Qian Ke ◽

Donald Geman ◽

Luigi Marchionni

Keyword(s):

R Package ◽

The Cancer Genome Atlas ◽

High Dimensional ◽

Omics Data ◽

Ternary Code ◽

Cancer Genome Atlas ◽

Level Analysis ◽

Data Analysis Methods ◽

Genome Atlas ◽

Omics Data Analysis

Given the ever-increasing amount of high-dimensional and complex omics data becoming available, it is increasingly important to discover simple but effective methods of analysis. Divergence analysis transforms each entry of a high-dimensional omics profile into a digitized (binary or ternary) code based on the deviation of the entry from a given baseline population. This is a novel framework that is significantly different from existing omics data analysis methods: it allows digitization of continuous omics data at the univariate or multivariate level, facilitates sample level analysis, and is applicable on many different omics platforms. The divergence package, available on the R platform through the Bioconductor repository collection, provides easy-to-use functions for carrying out this transformation. Here we demonstrate how to use the package with data from the Cancer Genome Atlas.

Download Full-text

Multiomic Integration of Public Oncology Databases in Bioconductor

JCO Clinical Cancer Informatics ◽

10.1200/cci.19.00119 ◽

2020 ◽

pp. 958-971

Author(s):

Marcel Ramos ◽

Ludwig Geistlinger ◽

Sehyun Oh ◽

Lucas Schiffer ◽

Rimsha Azhar ◽

...

Keyword(s):

Web Application ◽

Cancer Genomics ◽

Application Programming Interface ◽

Data Representation ◽

The Cancer Genome Atlas ◽

Data Sets ◽

Data Types ◽

Data Infrastructure ◽

Integrative Framework ◽

Pan Cancer

PURPOSE Investigations of the molecular basis for the development, progression, and treatment of cancer increasingly use complementary genomic assays to gather multiomic data, but management and analysis of such data remain complex. The cBioPortal for cancer genomics currently provides multiomic data from > 260 public studies, including The Cancer Genome Atlas (TCGA) data sets, but integration of different data types remains challenging and error prone for computational methods and tools using these resources. Recent advances in data infrastructure within the Bioconductor project enable a novel and powerful approach to creating fully integrated representations of these multiomic, pan-cancer databases. METHODS We provide a set of R/Bioconductor packages for working with TCGA legacy data and cBioPortal data, with special considerations for loading time; efficient representations in and out of memory; analysis platform; and an integrative framework, such as MultiAssayExperiment. Large methylation data sets are provided through out-of-memory data representation to provide responsive loading times and analysis capabilities on machines with limited memory. RESULTS We developed the curatedTCGAData and cBioPortalData R/Bioconductor packages to provide integrated multiomic data sets from the TCGA legacy database and the cBioPortal web application programming interface using the MultiAssayExperiment data structure. This suite of tools provides coordination of diverse experimental assays with clinicopathological data with minimal data management burden, as demonstrated through several greatly simplified multiomic and pan-cancer analyses. CONCLUSION These integrated representations enable analysts and tool developers to apply general statistical and plotting methods to extensive multiomic data through user-friendly commands and documented examples.

Download Full-text

The LIM Protein Ajuba Augments Tumor Metastasis in Colon Cancer

Cancers ◽

10.3390/cancers12071913 ◽

2020 ◽

Vol 12 (7) ◽

pp. 1913

Author(s):

Noëlle Dommann ◽

Daniel Sánchez-Taltavull ◽

Linda Eggs ◽

Fabienne Birrer ◽

Tess Brodie ◽

...

Keyword(s):

Colon Cancer ◽

Cell Lines ◽

Human Colon ◽

Tumor Burden ◽

The Cancer Genome Atlas ◽

Cancer Tissue ◽

Potential Candidate ◽

Sequencing Data ◽

Human Colon Cancer ◽

Cancer Genome Atlas

Colorectal cancer, along with its high potential for recurrence and metastasis, is a major health burden. Uncovering proteins and pathways required for tumor cell growth is necessary for the development of novel targeted therapies. Ajuba is a member of the LIM domain family of proteins whose expression is positively associated with numerous cancers. Our data shows that Ajuba is highly expressed in human colon cancer tissue and cell lines. Publicly available data from The Cancer Genome Atlas shows a negative correlation between survival and Ajuba expression in patients with colon cancer. To investigate its function, we transduced SW480 human colon cancer cells, with lentiviral constructs to knockdown or overexpress Ajuba protein. The transcriptome of the modified cell lines was analyzed by RNA sequencing. Among the pathways enriched in the differentially expressed genes, were cell proliferation, migration and differentiation. We confirmed our sequencing data with biological assays; cells depleted of Ajuba were less proliferative, more sensitive to irradiation, migrated less and were less efficient in colony formation. In addition, loss of Ajuba expression decreased the tumor burden in a murine model of colorectal metastasis to the liver. Taken together, our data supports that Ajuba promotes colon cancer growth, migration and metastasis and therefore is a potential candidate for targeted therapy.

Download Full-text

Integrative Analysis of Multi-Omics Data Based on Blockwise Sparse Principal Components

International Journal of Molecular Sciences ◽

10.3390/ijms21218202 ◽

2020 ◽

Vol 21 (21) ◽

pp. 8202

Author(s):

Mira Park ◽

Doyoen Kim ◽

Kwanyoung Moon ◽

Taesung Park

Keyword(s):

Model Fitting ◽

Principal Component ◽

The Cancer Genome Atlas ◽

Integrated Analysis ◽

Omics Data ◽

Second Stage ◽

Variable Clustering ◽

Novel Approach ◽

Cancer Genome Atlas ◽

Two Stages

The recent development of high-throughput technology has allowed us to accumulate vast amounts of multi-omics data. Because even single omics data have a large number of variables, integrated analysis of multi-omics data suffers from problems such as computational instability and variable redundancy. Most multi-omics data analyses apply single supervised analysis, repeatedly, for dimensional reduction and variable selection. However, these approaches cannot avoid the problems of redundancy and collinearity of variables. In this study, we propose a novel approach using blockwise component analysis. This would solve the limitations of current methods by applying variable clustering and sparse principal component (sPC) analysis. Our approach consists of two stages. The first stage identifies homogeneous variable blocks, and then extracts sPCs, for each omics dataset. The second stage merges sPCs from each omics dataset, and then constructs a prediction model. We also propose a graphical method showing the results of sparse PCA and model fitting, simultaneously. We applied the proposed methodology to glioblastoma multiforme data from The Cancer Genome Atlas. The comparison with other existing approaches showed that our proposed methodology is more easily interpretable than other approaches, and has comparable predictive power, with a much smaller number of variables.

Download Full-text

The human intermediate prolactin receptor is a mammary proto-oncogene

npj Breast Cancer ◽

10.1038/s41523-021-00243-7 ◽

2021 ◽

Vol 7 (1) ◽

Author(s):

Jacqueline M. Grible ◽

Patricija Zot ◽

Amy L. Olex ◽

Shannon E. Hedrick ◽

J. Chuck Harrell ◽

...

Keyword(s):

Breast Cancer ◽

Prolactin Receptor ◽

The Cancer Genome Atlas ◽

Cancer Tissue ◽

Mcf7 Cells ◽

Cancer Pathogenesis ◽

Cancer Genome Atlas ◽

Alternatively Spliced

AbstractThe hormone prolactin (PRL) and its receptor (hPRLr) are significantly involved in breast cancer pathogenesis. The intermediate hPRLr (hPRLrI) is an alternatively-spliced isoform, capable of stimulating cellular viability and proliferation. An analogous truncated mouse PRLr (mPRLr) was recently found to be oncogenic when co-expressed with wild-type mPRLr. The goal of this study was to determine if a similar transforming event occurs with the hPRLr in human breast epithelial cells and to better understand the mechanism behind such transformation. hPRLrL+I co-expression in MCF10AT cells resulted in robust in vivo and in vitro transformation, while hPRLrI knock-down in MCF7 cells significantly decreased in vitro malignant potential. hPRLrL+I heterodimers displayed greater stability than hPRLrL homodimers, and while being capable of activating Jak2, Ras, and MAPK, they were unable to induce Stat5a tyrosine phosphorylation. Both immunohistochemical breast cancer tissue microarray data and RNA sequencing analyses using The Cancer Genome Atlas (TCGA) identified that higher hPRLrI expression associates with triple-negative breast cancer. These studies indicate the hPRLrI, when expressed alongside hPRLrL, participates in mammary transformation, and represents a novel oncogenic mechanism.

Download Full-text

Deep Subspace Mutual Learning For Cancer Subtypes Prediction

Bioinformatics ◽

10.1093/bioinformatics/btab625 ◽

2021 ◽

Author(s):

Bo Yang ◽

Ting-Ting Xin ◽

Shan-Min Pang ◽

Meng Wang ◽

Yi-Jie Wang

Keyword(s):

The Cancer Genome Atlas ◽

Supplementary Information ◽

Omics Data ◽

Computational Framework ◽

Disease Etiology ◽

Cancer Subtypes ◽

Mutual Learning ◽

Cancer Genome Atlas ◽

Precise Prediction ◽

Multi Level

Abstract Motivation Precise prediction of cancer subtypes is of significant importance in cancer diagnosis and treatment. Disease etiology is complicated existing at different omics levels, hence integrative analysis provides a very effective way to improve our understanding of cancer. Results We propose a novel computational framework, named Deep Subspace Mutual Learning (DSML). DSML has the capability to simultaneously learn the subspace structures in each available omics data and in overall multi-omics data by adopting deep neural networks, which thereby facilitates the subtypes prediction via clustering on multi-level, single level, and partial level omics data. Extensive experiments are performed in five different cancers on three levels of omics data from The Cancer Genome Atlas. The experimental analysis demonstrates that DSML delivers comparable or even better results than many state-of-the-art integrative methods. Availability An implementation and documentation of the DSML is publicly available at https://github.com/polytechnicXTT/Deep-Subspace-Mutual-Learning.git. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text