Exploring TCGA Pan-Cancer Data at the UCSC Cancer Genomics Browser

Melissa S. Cline; Brian Craft; Teresa Swatloski; Mary Goldman; Singer Ma; David Haussler; Jingchun Zhu

doi:10.1038/srep02652

Discovery of Latent Drivers from Double Mutations in Pan-Cancer Data Reveal their Clinical Impact

10.1101/2021.04.02.438239 ◽

2021 ◽

Author(s):

Bengi Ruken Yavuz ◽

Chung-Jung Tsai ◽

Ruth Nussinov ◽

Nurcan Tuncbag

Keyword(s):

Cell Lines ◽

Drug Response ◽

Cancer Genomics ◽

Single Gene ◽

Clinical Information ◽

Driver Mutations ◽

Patient Specific ◽

Cancer Data ◽

Activated State ◽

Pan Cancer

Transforming patient-specific molecular data into clinical decisions is fundamental to personalized medicine. Despite massive advancements in cancer genomics, to date driver mutations whose frequencies are low, and their observable transformation potential is minor have escaped identification. Yet, when paired with other mutations in cis, such 'latent driver' mutations can drive cancer. Here, we discover potential 'latent driver' double mutations. We applied a statistical approach to identify significantly co-occurring mutations in the pan-cancer data of mutation profiles of ~80,000 tumor sequences from the TCGA and AACR GENIE databases. The components of same gene doublets were assessed as potential latent drivers. We merged the analysis of the significant double mutations with drug response data of cell lines and patient derived xenografts (PDXs). This allowed us to link the potential impact of double mutations to clinical information and discover signatures for some cancer types. Our comprehensive statistical analysis identified 228 same gene double mutations of which 113 mutations are cataloged as latent drivers. Oncogenic activation of a protein can be through either single or multiple independent mechanisms of action. Combinations of a driver mutation with either a driver, a weak driver, or a strong latent driver have the potential of a single gene leading to a fully activated state and high drug response rate. Tumor suppressors require higher mutational load to coincide with double mutations compared to oncogenes which implies their relative robustness to losing their functions. Evaluation of the response of cell lines and patient-derived xenograft data to drug treatment indicate that in certain genes double

Download Full-text

A novel unsupervised learning model for detecting driver genes from pan-cancer data through matrix tri-factorization framework with pairwise similarities constraints

Neurocomputing ◽

10.1016/j.neucom.2018.03.026 ◽

2018 ◽

Vol 296 ◽

pp. 64-73 ◽

Cited By ~ 7

Author(s):

Jianing Xi ◽

Ao Li ◽

Minghui Wang

Keyword(s):

Unsupervised Learning ◽

Learning Model ◽

Driver Genes ◽

Cancer Data ◽

Pan Cancer

Download Full-text

Pan-cancer landscape of homologous recombination deficiency

10.1101/2020.01.13.905026 ◽

2020 ◽

Cited By ~ 5

Author(s):

Luan Nguyen ◽

John Martens ◽

Arne Van Hoeck ◽

Edwin Cuppen

Keyword(s):

Prostate Cancer ◽

Homologous Recombination ◽

Cancer Genomics ◽

Strand Break ◽

Diagnostic Value ◽

Cancer Type ◽

Brca1 And Brca2 ◽

Homologous Recombination Deficiency ◽

A Genome ◽

Pan Cancer

AbstractHomologous recombination deficiency (HRD) results in impaired double strand break repair and is a frequent driver of tumorigenesis. Here, we developed a genome-wide mutational scar-based pan-cancer Classifier of HOmologous Recombination Deficiency (CHORD) that can discriminate BRCA1- and BRCA2-subtypes. Analysis of a metastatic (n=3,504) and primary (n=1,854) pan-cancer cohort revealed HRD was most frequent in ovarian and breast cancer, followed by pancreatic and prostate cancer. Biallelic inactivation of BRCA1, BRCA2, RAD51C or PALB2 was the most common genetic cause of HRD, with RAD51C and PALB2 inactivation resulting in BRCA2-type HRD. While the specific genetic cause of HRD was cancer type specific, biallelic inactivation was predominantly associated with loss-of-heterozygosity (LOH), with increased contribution of deep deletions in prostate cancer. Our results demonstrate the value of pan-cancer genomics-based HRD testing and its potential diagnostic value for patient stratification towards treatment with e.g. poly ADP-ribose polymerase inhibitors (PARPi).

Download Full-text

Pan-Cancer Analysis Reveals that the SARS-CoV-2 Receptor ACE2 is a Protective Factor for Cancer Progression

10.21203/rs.3.rs-42534/v1 ◽

2020 ◽

Author(s):

Zhilan Zhang ◽

Lin Li ◽

Mengyuan Li ◽

Xiaosheng Wang

Keyword(s):

Tumor Progression ◽

Clinical Outcomes ◽

Cancer Progression ◽

Antitumor Immunity ◽

Protective Factor ◽

Cancer Genomics ◽

Mesenchymal Transition ◽

Expression Levels ◽

Oncogenic Pathways ◽

Pan Cancer

Abstract Background: The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has infected more than 13 million people and has caused more than 570,000 deaths worldwide as of July 13, 2020. The SARS-CoV-2 human cell receptor ACE2 has recently received extensive attention for its role in SARS-CoV-2 infection. Many studies have also explored the association between ACE2 and cancer. However, a systemic investigation into associations between ACE2 and oncogenic pathways, tumor progression, and clinical outcomes in pan-cancer remains lacking. Methods: Using cancer genomics datasets from the Cancer Genome Atlas (TCGA) program, we performed computational analyses of associations between ACE2 expression and antitumor immunity, immunotherapy response, oncogenic pathways, tumor progression phenotypes, and clinical outcomes in 12 cancer cohorts. We also identified co-expression networks of ACE2 in cancer.Results: ACE2 upregulation was associated with increased antitumor immune signatures and PD-L1 expression, and favorable anti-PD-1/PD-L1/CTLA-4 immunotherapy response. ACE2 expression levels inversely correlated with the activity of cell cycle, mismatch repair, TGF-β, Wnt, VEGF, and Notch signaling pathways. Moreover, ACE2 expression levels had significant inverse correlations with tumor proliferation, stemness, and epithelial-mesenchymal transition (EMT). ACE2 upregulation was associated with favorable survival in pan-cancer and in multiple individual cancer types. Conclusions: ACE2 upregulation was associated with increased antitumor immunity and immunotherapy response, reduced tumor malignancy, and favorable survival in cancer, suggesting that ACE2 is a protective factor for cancer progression. Our data may provide potential clinical implications for treating cancer patients infected with SARS-CoV-2.

Download Full-text

Multiomic Integration of Public Oncology Databases in Bioconductor

JCO Clinical Cancer Informatics ◽

10.1200/cci.19.00119 ◽

2020 ◽

pp. 958-971

Author(s):

Marcel Ramos ◽

Ludwig Geistlinger ◽

Sehyun Oh ◽

Lucas Schiffer ◽

Rimsha Azhar ◽

...

Keyword(s):

Web Application ◽

Cancer Genomics ◽

Application Programming Interface ◽

Data Representation ◽

The Cancer Genome Atlas ◽

Data Sets ◽

Data Types ◽

Data Infrastructure ◽

Integrative Framework ◽

Pan Cancer

PURPOSE Investigations of the molecular basis for the development, progression, and treatment of cancer increasingly use complementary genomic assays to gather multiomic data, but management and analysis of such data remain complex. The cBioPortal for cancer genomics currently provides multiomic data from > 260 public studies, including The Cancer Genome Atlas (TCGA) data sets, but integration of different data types remains challenging and error prone for computational methods and tools using these resources. Recent advances in data infrastructure within the Bioconductor project enable a novel and powerful approach to creating fully integrated representations of these multiomic, pan-cancer databases. METHODS We provide a set of R/Bioconductor packages for working with TCGA legacy data and cBioPortal data, with special considerations for loading time; efficient representations in and out of memory; analysis platform; and an integrative framework, such as MultiAssayExperiment. Large methylation data sets are provided through out-of-memory data representation to provide responsive loading times and analysis capabilities on machines with limited memory. RESULTS We developed the curatedTCGAData and cBioPortalData R/Bioconductor packages to provide integrated multiomic data sets from the TCGA legacy database and the cBioPortal web application programming interface using the MultiAssayExperiment data structure. This suite of tools provides coordination of diverse experimental assays with clinicopathological data with minimal data management burden, as demonstrated through several greatly simplified multiomic and pan-cancer analyses. CONCLUSION These integrated representations enable analysts and tool developers to apply general statistical and plotting methods to extensive multiomic data through user-friendly commands and documented examples.

Download Full-text

Identification of pan-cancer Ras pathway activation with deep learning

Briefings in Bioinformatics ◽

10.1093/bib/bbaa258 ◽

2020 ◽

Author(s):

Xiangtao Li ◽

Shaochuan Li ◽

Yunhe Wang ◽

Shixiong Zhang ◽

Ka-Chun Wong

Keyword(s):

Deep Learning ◽

Superior Performance ◽

Recent Attempt ◽

Precision Oncology ◽

Pathway Activity ◽

Ras Pathway ◽

Cancer Data ◽

Pathway Activation ◽

Cancer Types ◽

Pan Cancer

Abstract The identification of hidden responders is often an essential challenge in precision oncology. A recent attempt based on machine learning has been proposed for classifying aberrant pathway activity from multiomic cancer data. However, we note several critical limitations there, such as high-dimensionality, data sparsity and model performance. Given the central importance and broad impact of precision oncology, we propose nature-inspired deep Ras activation pan-cancer (NatDRAP), a deep neural network (DNN) model, to address those restrictions for the identification of hidden responders. In this study, we develop the nature-inspired deep learning model that integrates bulk RNA sequencing, copy number and mutation data from PanCanAltas to detect pan-cancer Ras pathway activation. In NatDRAP, we propose to synergize the nature-inspired artificial bee colony algorithm with different gradient-based optimizers in one framework for optimizing DNNs in a collaborative manner. Multiple experiments were conducted on 33 different cancer types across PanCanAtlas. The experimental results demonstrate that the proposed NatDRAP can provide superior performance over other benchmark methods with strong robustness towards diagnosing RAS aberrant pathway activity across different cancer types. In addition, gene ontology enrichment and pathological analysis are conducted to reveal novel insights into the RAS aberrant pathway activity identification and characterization. NatDRAP is written in Python and available at https://github.com/lixt314/NatDRAP1.

Download Full-text

Pan-cancer landscape of homologous recombination deficiency

Nature Communications ◽

10.1038/s41467-020-19406-4 ◽

2020 ◽

Vol 11 (1) ◽

Cited By ~ 1

Author(s):

Luan Nguyen ◽

John W. M. Martens ◽

Arne Van Hoeck ◽

Edwin Cuppen

Keyword(s):

Prostate Cancer ◽

Homologous Recombination ◽

Cancer Genomics ◽

Strand Break ◽

Diagnostic Value ◽

Cancer Type ◽

Homologous Recombination Deficiency ◽

Genome Wide ◽

A Genome ◽

Pan Cancer

Abstract Homologous recombination deficiency (HRD) results in impaired double strand break repair and is a frequent driver of tumorigenesis. Here, we develop a genome-wide mutational scar-based pan-cancer Classifier of HOmologous Recombination Deficiency (CHORD) that can discriminate BRCA1- and BRCA2-subtypes. Analysis of a metastatic (n = 3,504) and primary (n = 1,854) pan-cancer cohort reveals that HRD is most frequent in ovarian and breast cancer, followed by pancreatic and prostate cancer. We identify biallelic inactivation of BRCA1, BRCA2, RAD51C or PALB2 as the most common genetic cause of HRD, with RAD51C and PALB2 inactivation resulting in BRCA2-type HRD. We find that while the specific genetic cause of HRD is cancer type specific, biallelic inactivation is predominantly associated with loss-of-heterozygosity (LOH), with increased contribution of deep deletions in prostate cancer. Our results demonstrate the value of pan-cancer genomics-based HRD testing and its potential diagnostic value for patient stratification towards treatment with e.g. poly ADP-ribose polymerase inhibitors (PARPi).

Download Full-text

The Biological Function Delineated Across Pan-Cancer Levels Through lncRNA-Based Prognostic Risk Assessment Factors for Pancreatic Cancer

Frontiers in Cell and Developmental Biology ◽

10.3389/fcell.2021.694652 ◽

2021 ◽

Vol 9 ◽

Author(s):

Xudong Tang ◽

Mengyan Zhang ◽

Liang Sun ◽

Fengyan Xu ◽

Xin Peng ◽

...

Keyword(s):

Pancreatic Cancer ◽

Prognostic Marker ◽

Cancer Genome ◽

The Cancer Genome Atlas ◽

Research Network ◽

Molecular Characteristics ◽

Cancer Data ◽

Cancer Genome Atlas ◽

Pan Cancer ◽

Genome Atlas

Long non-coding RNAs (lncRNAs) play key roles in tumors and function not only as important molecular markers for cancer prognosis, but also as molecular characteristics at the pan-cancer level. Because of the poor prognosis of pancreatic cancer, accurate assessment of prognosis is a key issue in the development of treatment plans for pancreatic cancer. Here we analyzed pancreatic cancer data from The Cancer Genome Atlas and The Genotype Tissue Expression database using Cox regression and lasso regression in analyses using a combination of the two databases as well as only The Cancer Genome Atlas database (Cancer Genome Atlas Research Network et al., 2013). A prognostic risk score model with significant correlation with pancreatic cancer survival was constructed, and two lncRNAs were investigated. Additional analysis of 33 cancers using the two lncRNAs showed that lncRNA TsPOAP1-AS1 was a prognostic marker of seven cancers, among which pancreatic cancer was the most significant, and lncRNA mi600hg was a prognostic marker of ovarian cancer and pancreatic cancer. LncRNA TsPOAP1-AS1 is associated with clinical stage and tumor mutation burden of some cancers as well as a strong degree of immune infiltration in many cancers, while a strong correlation between lncRNA mi600hg and microsatellite instability was observed in several cancers. The results of this study help further our understanding of the different functions of lncRNAs in cancer and may aid in the clinical application of lncRNAs as prognostic factors for cancer.

Download Full-text

The Cancer Genomic Atlas – “TO CONQUER CANCER”

International Journal of Molecular and Immuno Oncology ◽

10.25259/ijmio_28_2020 ◽

2020 ◽

Vol 0 ◽

pp. 1-6

Author(s):

Sai Sri Kavya Kadali ◽

Rachna Gowlikar ◽

Syeda Nooreen Fatima

Keyword(s):

Genetic Basis ◽

Data Repository ◽

Rare Cancer ◽

Cancer Data ◽

Data Collection Process ◽

Genomics And Proteomics ◽

Prevention Studies ◽

Cancer Types ◽

Pan Cancer

The Cancer Genomic Atlas (TCGA) is a publicly accessible cancer data repository and tool that allows us to understand the molecular basis of cancer through the application of genomics and proteomics. So far, researchers have been able to diagnose 33 cancer types including 10 rare cancer types. The key features of TCGA are to make the data collection process publicly accessible for the better understanding of the molecular and genetic basis of cancer and its mechanism of action along with its prevention. Studies on different cancer types along with comprehensive pan cancer analysis have expanded the understanding and purpose of TCGA. Ever since its’ conceptualization, its’ high-throughput approach has provided a platform for the identification of genes and pathways involved in cancers and accurate classification of cancers.

Download Full-text

Comprehensive Analysis to Identify SPP1 as a Prognostic Biomarker in Cervical Cancer

Frontiers in Genetics ◽

10.3389/fgene.2021.732822 ◽

2022 ◽

Vol 12 ◽

Author(s):

Kaidi Zhao ◽

Zhou Ma ◽

Wei Zhang

Keyword(s):

Cervical Cancer ◽

Roc Curve ◽

Immune Cell ◽

Cox Regression ◽

Prognostic Biomarker ◽

R Package ◽

Cancer Data ◽

Immune Infiltration ◽

Pan Cancer ◽

Geo Database

Background:SPP1, secreted phosphoprotein 1, is a member of the small integrin-binding ligand N-linked glycoprotein (SIBLING) family. Previous studies have proven SPP1 overexpressed in a variety of cancers and can be identified as a prognostic factor, while no study has explored the function and carcinogenic mechanism of SPP1 in cervical cancer.Methods: We aimed to demonstrate the relationship between SPP1 expression and pan-cancer using The Cancer Genome Atlas (TCGA) database. Next, we validated SPP1 expression of cervical cancer in the Gene Expression Omnibus (GEO) database, including GSE7803, GSE63514, and GSE9750. The receiver operating characteristic (ROC) curve was used to evaluate the feasibility of SPP1 as a differentiating factor by the area under curve (AUC) score. Cox regression and logistic regression were performed to evaluate factors associated with prognosis. The SPP1-binding protein network was built by the STRING tool. Enrichment analysis by the R package clusterProfiler was used to explore potential function of SPP1. The single-sample GSEA (ssGSEA) method from the R package GSVA and TIMER database were used to investigate the association between the immune infiltration level and SPP1 expression in cervical cancer.Results: Pan-cancer data analysis showed that SPP1 expression was higher in most cancer types, including cervical cancer, and we got the same result in the GEO database. The ROC curve suggested that SPP1 could be a potential diagnostic biomarker (AUC = 0.877). High SPP1 expression was associated with poorer overall survival (OS) (P = 0.032). Further enrichment and immune infiltration analysis revealed that high SPP1 expression was correlated with regulating the infiltration level of neutrophil cells and some immune cell types, including macrophage and DC.Conclusion:SPP1 expression was higher in cervical cancer tissues than in normal cervical epithelial tissues. It was significantly associated with poor prognosis and immune cell infiltration. Thus, SPP1 may become a promising prognostic biomarker for cervical cancer patients.

Download Full-text