TumorFusions: an integrative resource for reporting cancer-associated transcript fusions in 33 tumor types

Mapping Intimacies ◽

10.1101/162180 ◽

2017 ◽

Author(s):

Xin Hu ◽

Qianghu Wang ◽

Floris Barthel ◽

Ming Tang ◽

Samirkumar Amin ◽

...

Keyword(s):

The Cancer Genome Atlas ◽

Whole Genome Sequencing Data ◽

Fusion Genes ◽

Sequencing Data ◽

Structural Variations ◽

Cancer Genome Atlas ◽

Cancer Types ◽

Tumor Types ◽

Fusion Detection ◽

Genome Atlas

Fusion genes, particularly those involving kinases, have been demonstrated as drivers and are frequent therapeutic targets in cancer1. Here, we describe our results on detecting transcript fusions across 33 cancer types from The Cancer Genome Atlas (TCGA), totaling 9,966 cancer samples and 648 normal samples2. Preprocessing, including read alignment to both genome and transcriptome, and fusion detection were carried out using a uniform pipeline3. To validate the resultant fusions, we also called somatic structural variations for 561 cancers from whole genome sequencing data. A summary of the data used in this study is provided in Table S1. Our results can be accessed per our portal at http://www.tumorfusions.org.

Download Full-text

Epstein-Barr Virus-Positive Cancers Show Altered B-Cell Clonality

mSystems ◽

10.1128/msystems.00081-18 ◽

2018 ◽

Vol 3 (5) ◽

Cited By ~ 11

Author(s):

Sara R. Selitsky ◽

David Marron ◽

Lisle E. Mose ◽

Joel S. Parker ◽

Dirk P. Dittmer

Keyword(s):

B Cells ◽

B Cell ◽

Epstein Barr Virus ◽

The Cancer Genome Atlas ◽

Barr Virus ◽

Cancer Genome Atlas ◽

Cancer Types ◽

Epstein Barr ◽

Tumor Types ◽

Genome Atlas

ABSTRACTEpstein-Barr virus (EBV) is convincingly associated with gastric cancer, nasopharyngeal carcinoma, and certain lymphomas, but its role in other cancer types remains controversial. To test the hypothesis that there are additional cancer types with high prevalence of EBV, we determined EBV viral expression in all the Cancer Genome Atlas Project (TCGA) mRNA sequencing (mRNA-seq) samples (n= 10,396) from 32 different tumor types. We found that EBV was present in gastric adenocarcinoma and lymphoma, as expected, and was also present in >5% of samples in 10 additional tumor types. For most samples, EBV transcript levels were low, which suggests that EBV was likely present due to infected infiltrating B cells. In order to determine if there was a difference in the B-cell populations, we assembled B-cell receptors for each sample and found B-cell receptor abundance (P≤ 1.4 × 10−20) and diversity (P≤ 8.3 × 10−27) were significantly higher in EBV-positive samples. Moreover, diversity was independent of B-cell abundance, suggesting that the presence of EBV was associated with an increased and altered B-cell population.IMPORTANCEAround 20% of human cancers are associated with viruses. Epstein-Barr virus (EBV) contributes to gastric cancer, nasopharyngeal carcinoma, and certain lymphomas, but its role in other cancer types remains controversial. We assessed the prevalence of EBV in RNA-seq from 32 tumor types in the Cancer Genome Atlas Project (TCGA) and found EBV to be present in >5% of samples in 12 tumor types. EBV infects epithelial cells and B cells and in B cells causes proliferation. We hypothesized that the low expression of EBV in most of the tumor types was due to infiltration of B cells into the tumor. The increase in B-cell abundance and diversity in subjects where EBV was detected in the tumors strengthens this hypothesis. Overall, we found that EBV was associated with an increased and altered immune response. This result is not evidence of causality, but a potential novel biomarker for tumor immune status.

Download Full-text

Improved detection of gene fusions by applying statistical methods reveals new oncogenic RNA cancer drivers

10.1101/659078 ◽

2019 ◽

Author(s):

Roozbeh Dehghannasiri ◽

Donald Eric Freeman ◽

Milos Jordanski ◽

Gillian L. Hsieh ◽

Ana Damljanovic ◽

...

Keyword(s):

False Positive ◽

Cancer Biology ◽

Cancer Genome ◽

The Cancer Genome Atlas ◽

Gene Fusions ◽

Cancer Genome Atlas ◽

Selection For ◽

Tumor Types ◽

Fusion Detection ◽

Genome Atlas

Short AbstractThe extent to which gene fusions function as drivers of cancer remains a critical open question. Current algorithms do not sufficiently identify false-positive fusions arising during library preparation, sequencing, and alignment. Here, we introduce a new algorithm, DEEPEST, that uses statistical modeling to minimize false-positives while increasing the sensitivity of fusion detection. In 9,946 tumor RNA-sequencing datasets from The Cancer Genome Atlas (TCGA) across 33 tumor types, DEEPEST identifies 31,007 fusions, 30% more than identified by other methods, while calling ten-fold fewer false-positive fusions in non-transformed human tissues. We leverage the increased precision of DEEPEST to discover new cancer biology. For example, 888 new candidate oncogenes are identified based on over-representation in DEEPEST-Fusion calls, and 1,078 previously unreported fusions involving long intergenic noncoding RNAs partners, demonstrating a previously unappreciated prevalence and potential for function. Specific protein domains are enriched in DEEPEST calls, demonstrating a global selection for fusion functionality: kinase domains are nearly 2-fold more enriched in DEEPEST calls than expected by chance, as are domains involved in (anaerobic) metabolism and DNA binding. DEEPEST also reveals a high enrichment for fusions involving known and novel oncogenes in diseases including ovarian cancer, which has had minimal treatment advances in recent decades, finding that more than 50% of tumors harbor gene fusions predicted to be oncogenic. The statistical algorithms, population-level analytic framework, and the biological conclusions of DEEPEST call for increased attention to gene fusions as drivers of cancer and for future research into using fusions for targeted therapy.SignificanceGene fusions are tumor-specific genomic aberrations and are among the most powerful biomarkers and drug targets in translational cancer biology. The advent of RNA-Seq technologies over the past decade has provided a unique opportunity for detecting novel fusions via deploying computational algorithms on public sequencing databases. Yet, precise fusion detection algorithms are still out of reach. We develop DEEPEST, a highly specific and efficient statistical pipeline specially designed for mining massive sequencing databases, and apply it to all 33 tumor types and 10,500 samples in The Cancer Genome Atlas database. We systematically profile the landscape of detected fusions via employing classic statistical models and identify several signatures of selection for fusions in tumors.Software availabilityDEEPEST-Fusion workflow with a detailed readme file is available as a Github repository:https://github.com/salzmanlab/DEEPEST-Fusion. In addition to the main workflow, which is based on CWL, example input and batch scripts (for job submission on local clusters), and codes for building the SBT files and SBT querying are provided in the repository. All custom scripts used for systematic analysis of fusions are also available in the same repository.

Download Full-text

Diagnostic and prognostic roles of IRAK1 in hepatocellular carcinoma tissues: an analysis of immunohistochemistry and RNA-sequencing data from the cancer genome atlas

OncoTargets and Therapy ◽

10.2147/ott.s132120 ◽

2017 ◽

Vol Volume 10 ◽

pp. 1711-1723 ◽

Cited By ~ 11

Author(s):

Zhi-hua Ye ◽

Li Gao ◽

Dong-yue Wen ◽

Yun He ◽

Yu-yan Pang ◽

...

Keyword(s):

Hepatocellular Carcinoma ◽

Rna Sequencing ◽

Cancer Genome ◽

The Cancer Genome Atlas ◽

Sequencing Data ◽

Cancer Genome Atlas ◽

Genome Atlas

Download Full-text

Genomic Common Data Model for Seamless Interoperation of Biomedical Data in Clinical Practice: Retrospective Study (Preprint)

10.2196/preprints.13249 ◽

2018 ◽

Author(s):

Seo Jeong Shin ◽

Seng Chan You ◽

Yu Rang Park ◽

Jin Roh ◽

Jang-Hee Kim ◽

...

Keyword(s):

Clinical Practice ◽

Human Genome ◽

Genomic Data ◽

Cancer Genome ◽

The Cancer Genome Atlas ◽

Common Data Model ◽

Sequencing Data ◽

School Of Medicine ◽

Cancer Genome Atlas ◽

Genome Atlas

BACKGROUND Clinical sequencing data should be shared in order to achieve the sufficient scale and diversity required to provide strong evidence for improving patient care. A distributed research network allows researchers to share this evidence rather than the patient-level data across centers, thereby avoiding privacy issues. The Observational Medical Outcomes Partnership (OMOP) common data model (CDM) used in distributed research networks has low coverage of sequencing data and does not reflect the latest trends of precision medicine. OBJECTIVE The aim of this study was to develop and evaluate the feasibility of a genomic CDM (G-CDM), as an extension of the OMOP-CDM, for application of genomic data in clinical practice. METHODS Existing genomic data models and sequencing reports were reviewed to extend the OMOP-CDM to cover genomic data. The Human Genome Organisation Gene Nomenclature Committee and Human Genome Variation Society nomenclature were adopted to standardize the terminology in the model. Sequencing data of 114 and 1060 patients with lung cancer were obtained from the Ajou University School of Medicine database of Ajou University Hospital and The Cancer Genome Atlas, respectively, which were transformed to a format appropriate for the G-CDM. The data were compared with respect to gene name, variant type, and actionable mutations. RESULTS The G-CDM was extended into four tables linked to tables of the OMOP-CDM. Upon comparison with The Cancer Genome Atlas data, a clinically actionable mutation, p.Leu858Arg, in the EGFR gene was 6.64 times more frequent in the Ajou University School of Medicine database, while the p.Gly12Xaa mutation in the KRAS gene was 2.02 times more frequent in The Cancer Genome Atlas dataset. The data-exploring tool GeneProfiler was further developed to conduct descriptive analyses automatically using the G-CDM, which provides the proportions of genes, variant types, and actionable mutations. GeneProfiler also allows for querying the specific gene name and Human Genome Variation Society nomenclature to calculate the proportion of patients with a given mutation. CONCLUSIONS We developed the G-CDM for effective integration of genomic data with standardized clinical data, allowing for data sharing across institutes. The feasibility of the G-CDM was validated by assessing the differences in data characteristics between two different genomic databases through the proposed data-exploring tool GeneProfiler. The G-CDM may facilitate analyses of interoperating clinical and genomic datasets across multiple institutions, minimizing privacy issues and enabling researchers to better understand the characteristics of patients and promote personalized medicine in clinical practice.

Download Full-text

Novel Insights into Epigenetic Regulation of IL6 Pathway: In Silico Perspective on Inflammation and Cancer Relationship

International Journal of Molecular Sciences ◽

10.3390/ijms221810172 ◽

2021 ◽

Vol 22 (18) ◽

pp. 10172

Author(s):

Saverio Candido ◽

Barbara Maria Rita Tomasello ◽

Alessandro Lavoro ◽

Luca Falzone ◽

Giuseppe Gattuso ◽

...

Keyword(s):

Dna Methylation ◽

Epigenetic Regulation ◽

Computational Analysis ◽

The Cancer Genome Atlas ◽

Post Translational Modifications ◽

Pathway Gene ◽

Cancer Genome Atlas ◽

Cancer Types ◽

Tumor Types

IL-6 pathway is abnormally hyperactivated in several cancers triggering tumor cell growth and immune system inhibition. Along with genomic mutation, the IL6 pathway gene expression can be affected by DNA methylation, microRNAs, and post-translational modifications. Computational analysis was performed on the Cancer Genome Atlas (TCGA) datasets to explore the role of IL6, IL6R, IL6ST, and IL6R transmembrane isoform expression and their epigenetic regulation in different cancer types. IL6 was significantly modulated in 70% of tumor types, revealing either up- or down-regulation in an approximately equal number of tumors. Furthermore, IL6R and IL6ST were downregulated in more than 10 tumors. Interestingly, the correlation analysis demonstrated that only the IL6R expression was negatively affected by the DNA methylation within the promoter region in most tumors. Meanwhile, only the IL6ST expression was extensively modulated by miRNAs including miR-182-5p, which also directly targeted all three genes. In addition, IL6 upregulated miR-181a-3p, mirR-214-3p, miR-18a-5p, and miR-938, which in turn inhibited the expression of IL6 receptors. Finally, the patients’ survival rate was significantly affected by analyzed targets in some tumors. Our results suggest the relevance of epigenetic regulation of IL6 signaling and pave the way for further studies to validate these findings and to assess the prognostic and therapeutic predictive value of these epigenetic markers on the clinical outcome and survival of cancer patients.

Download Full-text

Identifying Interaction Clusters for MiRNA and MRNA Pairs in TCGA Network

Genes ◽

10.3390/genes10090702 ◽

2019 ◽

Vol 10 (9) ◽

pp. 702 ◽

Cited By ~ 6

Author(s):

Dai ◽

Ding ◽

Liu ◽

Xu ◽

Jiang ◽

...

Keyword(s):

Messenger Rna ◽

The Cancer Genome Atlas ◽

Biological Functions ◽

Significant Cluster ◽

Scoring Method ◽

Nonalcoholic Fatty Liver ◽

Cancer Genome Atlas ◽

Cancer Types ◽

Liver Hepatocellular Carcinoma ◽

Tumor Types

Existing methods often fail to recognize the conversions for the biological roles of the pairs of genes and microRNAs (miRNAs) between the tumor and normal samples. We have developed a novel cluster scoring method to identify messenger RNA (mRNA) and miRNA interaction pairs and clusters while considering tumor and normal samples jointly. Our method has identified 54 significant clusters for 15 cancer types selected from The Cancer Genome Atlas project. We also determined the shared clusters across tumor types and/or subtypes. In addition, we compared gene and miRNA overlap between lists identified in our liver hepatocellular carcinoma (LIHC) study and regulatory relationships reported from human and rat nonalcoholic fatty liver disease studies (NAFLD). Finally, we analyzed biological functions for the single significant cluster in LIHC and uncovered a significantly enriched pathway (phospholipase D signaling pathway) with six genes represented in the cluster, symbols: DGKQ, LPAR2, PDGFRB, PIK3R3, PTGFR and RAPGEF3.

Download Full-text

Analysis of The Cancer Genome Atlas sequencing data reveals novel properties of the human papillomavirus 16 genome in head and neck squamous cell carcinoma

Oncotarget ◽

10.18632/oncotarget.15179 ◽

2017 ◽

Vol 8 (11) ◽

pp. 17684-17699 ◽

Cited By ~ 58

Author(s):

Tara J. Nulton ◽

Amy L. Olex ◽

Mikhail Dozmorov ◽

Iain M. Morgan ◽

Brad Windle

Keyword(s):

Squamous Cell Carcinoma ◽

Human Papillomavirus ◽

Cell Carcinoma ◽

Head And Neck ◽

Squamous Cell ◽

The Cancer Genome Atlas ◽

Sequencing Data ◽

Human Papillomavirus 16 ◽

Cancer Genome Atlas ◽

Genome Atlas

Download Full-text

Enabling transparent and collaborative computational analysis of 12 tumor types within The Cancer Genome Atlas

Nature Genetics ◽

10.1038/ng.2761 ◽

2013 ◽

Vol 45 (10) ◽

pp. 1121-1126 ◽

Cited By ~ 77

Author(s):

Larsson Omberg ◽

Kyle Ellrott ◽

Yuan Yuan ◽

Cyriac Kandoth ◽

Chris Wong ◽

...

Keyword(s):

Computational Analysis ◽

Cancer Genome ◽

The Cancer Genome Atlas ◽

Cancer Genome Atlas ◽

Tumor Types ◽

Genome Atlas

Download Full-text

Characterizing Cancer-Specific Networks by Integrating TCGA Data

Cancer Informatics ◽

10.4137/cin.s13776 ◽

2014 ◽

Vol 13s2 ◽

pp. CIN.S13776

Author(s):

Yanxun Xu ◽

Yitan Zhu ◽

Peter Müller ◽

Riten Mitra ◽

Yuan Ji

Keyword(s):

Graphical Model ◽

Biological Network ◽

The Cancer Genome Atlas ◽

Genomic Features ◽

Conditional Dependence ◽

Copy Numbers ◽

Cancer Genome Atlas ◽

Cancer Types ◽

Dna Copy Numbers ◽

Genome Atlas

The Cancer Genome Atlas (TCGA) generates comprehensive genomic data for thousands of patients over more than 20 cancer types. TCGA data are typically whole-genome measurements of multiple genomic features, such as DNA copy numbers, DNA methylation, and gene expression, providing unique opportunities for investigating cancer mechanism from multiple molecular and regulatory layers. We propose a Bayesian graphical model to systemically integrate multi-platform TCGA data for inference of the interactions between different genomic features either within a gene or between multiple genes. The presence or absence of edges in the graph indicates the presence or absence of conditional dependence between genomic features. The inference is restricted to genes within a known biological network, but can be extended to any sets of genes. Applying the model to the same genes using patient samples in two different cancer types, we identify network components that are common as well as different between cancer types. The examples and codes are available at https://www.ma.utexas.edu/users/yxu/software.html .

Download Full-text

Pan-cancer analysis of non-coding recurrent mutations and their possible involvement in cancer pathogenesis

NAR Cancer ◽

10.1093/narcan/zcab008 ◽

2021 ◽

Vol 3 (1) ◽

Author(s):

Chie Kikutake ◽

Minako Yoshihara ◽

Mikita Suyama

Keyword(s):

The Cancer Genome Atlas ◽

Whole Genome Sequencing Data ◽

Sequencing Data ◽

Huge Number ◽

Protein Coding ◽

Coding Regions ◽

Cancer Pathogenesis ◽

Recurrent Mutations ◽

Cancer Genome Atlas ◽

Pan Cancer

Abstract Cancer-related mutations have been mainly identified in protein-coding regions. Recent studies have demonstrated that mutations in non-coding regions of the genome could also be a risk factor for cancer. However, the non-coding regions comprise 98% of the total length of the human genome and contain a huge number of mutations, making it difficult to interpret their impacts on pathogenesis of cancer. To comprehensively identify cancer-related non-coding mutations, we focused on recurrent mutations in non-coding regions using somatic mutation data from COSMIC and whole-genome sequencing data from The Cancer Genome Atlas (TCGA). We identified 21 574 recurrent mutations in non-coding regions that were shared by at least two different samples from both COSMIC and TCGA databases. Among them, 580 candidate cancer-related non-coding recurrent mutations were identified based on epigenomic and chromatin structure datasets. One of such mutation was located in RREB1 binding site that is thought to interact with TEAD1 promoter. Our results suggest that mutations may disrupt the binding of RREB1 to the candidate enhancer region and increase TEAD1 expression levels. Our findings demonstrate that non-coding recurrent mutations and coding mutations may contribute to the pathogenesis of cancer.

Download Full-text