Ennet: construction of potential cancer-driving networks based on somatic enhancer mutations only

Mapping Intimacies ◽

10.1101/216226 ◽

2017 ◽

Author(s):

Ya Cui ◽

Yiwei Niu ◽

Xueyi Teng ◽

Dan Wang ◽

Huaxia Luo ◽

...

Keyword(s):

Neural Development ◽

Somatic Mutations ◽

Gene Interactions ◽

Whole Genome ◽

Sequencing Technology ◽

Cancer Types ◽

Cancer Drivers ◽

New Perspective ◽

Pan Cancer ◽

Potential Cancer

AbstractWhole genome sequencing technology has facilitated the discovery of a large number of somatic mutations in enhancers (SMEs), whereas the utility of SMEs in tumorigenesis has not been fully explored. Here we present Ennet, a method to comprehensively investigate SMEs enriched networks (SME-networks) in cancer by integrating SMEs, enhancer-gene interactions and gene-gene interactions. Using Ennet, we performed a pan-cancer analysis in 2004 samples from 8 cancer types and found many well-known cancer drivers were involved in the SME-networks, includingESR1,SMAD3,MYC,EGFR,BCL2andPAX5. Meanwhile, Ennet also identified many new networks with less characterization but have potentially important roles in cancer, including a large SME-network in medulloblastoma (MB), which contains genes enriched in the glutamate receptor and neural development pathways. Interestingly, SME-networks are specific across cancer types, and the vast majority of the genes identified by Ennet have few mutations in gene bodies. Collectively, our work suggests that using enhancer-only somatic mutations can be an effective way to discover potential cancer-driving networks. Ennet provides a new perspective to explore new mechanisms for tumor progression from SMEs.

Download Full-text

Reconstructing and characterizing focal amplifications in cancer using AmpliconArchitect

10.1101/457333 ◽

2018 ◽

Cited By ~ 1

Author(s):

Viraj Deshpande ◽

Jens Luebeck ◽

Mehrdad Bakhtiari ◽

Nam-Phuong D Nguyen ◽

Kristen M Turner ◽

...

Keyword(s):

Cervical Cancer ◽

Fine Structure ◽

Tumor Growth ◽

Genome Sequencing ◽

Copy Number ◽

Whole Genome ◽

Multiple Cancer ◽

Cancer Data ◽

Cancer Types ◽

Pan Cancer

AbstractFocal oncogene amplification and rearrangements drive tumor growth and evolution in multiple cancer types. We developed a tool, AmpliconArchitect (AA), which can robustly reconstruct the fine structure of focally amplified regions using whole genome sequencing. AA-reconstructed amplicons in pan-cancer data and in virus-driven cervical cancer samples revealed many novel insights about focal amplifications. Specifically, the findings lend support to extrachromosomally mediated mechanisms for copy number expansion, and oncoviral pathogenesis.

Download Full-text

MutEx: a multifaceted gateway for exploring integrative pan-cancer genomic data

Briefings in Bioinformatics ◽

10.1093/bib/bbz084 ◽

2019 ◽

Vol 21 (4) ◽

pp. 1479-1486 ◽

Cited By ~ 2

Author(s):

Jie Ping ◽

Olufunmilola Oyebamiji ◽

Hui Yu ◽

Scott Ness ◽

Jeremy Chien ◽

...

Keyword(s):

Breast Cancer ◽

Gene Expression ◽

Somatic Mutation ◽

Survival Data ◽

Somatic Mutations ◽

Genomic Data ◽

Mismatch Repair Gene ◽

Survival Difference ◽

Cancer Types ◽

Pan Cancer

Abstract Somatic mutation and gene expression dysregulation are considered two major tumorigenesis factors. While independent investigations of either factor pervade, studies of associations between somatic mutations and gene expression changes have been sporadic and nonsystematic. Utilizing genomic data collected from 11 315 subjects of 33 distinct cancer types, we constructed MutEx, a pan-cancer integrative genomic database. This database records the relationships among gene expression, somatic mutation and survival data for cancer patients. MutEx can be used to swiftly explore the relationship between these genomic/clinic features within and across cancer types and, more importantly, search for corroborating evidence for hypothesis inception. Our database also incorporated Gene Ontology and several pathway databases to enhance functional annotation, and elastic net and a gene expression composite score to aid in survival analysis. To demonstrate the usability of MutEx, we provide several application examples, including top somatic mutations associated with the most extensive expression dysregulation in breast cancer, differential mutational burden downstream of DNA mismatch repair gene mutations and composite gene expression score-based survival difference in breast cancer. MutEx can be accessed at http://www.innovebioinfo.com/Databases/Mutationdb_About.php.

Download Full-text

CNCDatabase: a database of non-coding cancer drivers

10.1101/2020.04.29.069047 ◽

2020 ◽

Author(s):

Eric Minwei Liu ◽

Alexander Martinez-Fundichely ◽

Rajesh Bollapragada ◽

Maurice Spiewack ◽

Ekta Khurana

Keyword(s):

Gene Expression ◽

Luciferase Reporter ◽

Whole Genome ◽

Gene Promoters ◽

Genome Sequences ◽

Coding Regions ◽

Cancer Types ◽

Cancer Drivers ◽

Non Coding Rnas ◽

Experimental Validations

ABSTRACTMost mutations in cancer genomes occur in the non-coding regions with unknown impact to tumor development. Although the increase in number of cancer whole-genome sequences has revealed numerous putative non-coding cancer drivers, their information is dispersed across multiple studies and thus it is difficult to bridge the understanding of non-coding alterations, the genes they impact and the supporting evidence for their role in tumorigenesis across multiple cancer types. To address this gap, we have developed CNCDatabase, Cornell Non-Coding Cancer driver Database (https://cncdatabase.med.cornell.edu/) that contains detailed information about predicted non-coding drivers at gene promoters, 5’ and 3’ UTRs (untranslated regions), enhancers, CTCF insulators and non-coding RNAs. CNCDatabase documents 1,111 protein-coding genes and 90 non-coding RNAs with reported drivers in their non-coding regions from 32 cancer types by computational predictions of positive selection in whole-genome sequences; differential gene expression in samples with and without mutations; or another set of experimental validations including luciferase reporter assays and genome editing. The database can be easily modified and scaled as lists of non-coding drivers are revised in the community with larger whole-genome sequencing studies, CRISPR screens and further experimental validations. Overall, CNCDatabase provides a helpful resource for researchers to explore the pathological role of non-coding alterations and their associations with gene expression in human cancers.

Download Full-text

Chromoanagenesis Landscape in 10,000 TCGA Patients

Cancers ◽

10.3390/cancers13164197 ◽

2021 ◽

Vol 13 (16) ◽

pp. 4197

Author(s):

Roni Rasnic ◽

Michal Linial

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Learning Algorithm ◽

The Cancer Genome Atlas ◽

Cancer Type ◽

Whole Genome ◽

Cancer Genome Atlas ◽

Cancer Types ◽

Tumor Biopsies ◽

Pan Cancer

During the past decade, whole-genome sequencing of tumor biopsies and individuals with congenital disorders highlighted the phenomenon of chromoanagenesis, a single chaotic event of chromosomal rearrangement. Chromoanagenesis was shown to be frequent in many types of cancers, to occur in early stages of cancer development, and significantly impact the tumor’s nature. However, an in-depth, cancer-type dependent analysis has been somewhat incomplete due to the shortage in whole genome sequencing of cancerous samples. In this study, we extracted data from The Pan-Cancer Analysis of Whole Genome (PCAWG) and The Cancer Genome Atlas (TCGA) to construct and test a machine learning algorithm that can detect chromoanagenesis with high accuracy (86%). The algorithm was applied to ~10,000 unlabeled TCGA cancer patients. We utilize the chromoanagenesis assignment results, to analyze cancer-type specific chromoanagenesis characteristics in 20 TCGA cancer types. Our results unveil prominent genes affected in either chromoanagenesis or non-chromoanagenesis tumorigenesis. The analysis reveals a mutual exclusivity relationship between the genes impaired in chromoanagenesis versus non-chromoanagenesis cases. We offer the discovered characteristics as possible targets for cancer diagnostic and therapeutic purposes.

Download Full-text

Computing MicroRNA-Gene Interaction Networks in Pan-cancer Using MiRDriver

10.21203/rs.3.rs-1101651/v1 ◽

2021 ◽

Author(s):

Banabithi Bose ◽

Matthew Moravec ◽

Serdar Bozdag

Keyword(s):

Copy Number ◽

Target Genes ◽

The Cancer Genome Atlas ◽

Gene Interactions ◽

Microrna Target ◽

Multiple Cancer ◽

Driver Genes ◽

Microrna Gene ◽

Cancer Types ◽

Pan Cancer

Abstract DNA copy number aberrated regions in cancer are known to harbor cancer driver genes and the short non-coding RNA molecules, i.e., microRNAs. In this study, we integrated the multi-omics datasets such as copy number aberration, DNA methylation, gene and microRNA expression to identify the signature microRNA-gene associations from frequently aberrated DNA regions across pan-cancer utilizing a LASSO-based regression approach. We studied 7,294 patient samples associated with eighteen different cancer types from The Cancer Genome Atlas (TCGA) database and identified several cancer-specific microRNA-gene interactions enriched in experimentally validated microRNA-target databases. We highlighted several oncogenic and tumor suppressor microRNAs and genes that were common in several cancer types. Our method substantially outperformed the five state-of-art methods in selecting significantly known microRNA-gene interactions in multiple cancer types. Several microRNAs and genes were found to be associated with tumor survival and progression. Selected target genes were found to be significantly enriched in cancer-related pathways, cancer Hallmark and Gene Ontology (GO) terms. Furthermore, subtype-specific potential gene signatures were discovered in multiple cancer types.

Download Full-text

Pan-cancer scale landscape of simple somatic mutations

10.1101/112367 ◽

2017 ◽

Cited By ~ 1

Author(s):

Nan Zhou ◽

Blanca Gallego ◽

Jinku Bao ◽

Guy Tsafnat

Keyword(s):

Myeloid Leukemia ◽

Somatic Mutations ◽

Whole Genome ◽

Single Base ◽

Mutational Spectra ◽

Base Substitutions ◽

Whole Exome ◽

High Prevalence ◽

Pan Cancer ◽

Acute Myeloid

AbstractGenome is the carrier of somatic mutations during the development of cancer. The catalogue of simple somatic mutations (SSM) is a subgroup of somatic mutations. It includes single base substitutions, small deletions and insertions of <= 200 bp, and multiple base substitutions of <= 200 bp. The comprehensive landscape of SSM has not been studied. After analysed 46,692,922 SSM of 10,878 samples, we proposed a pan-cancer scale landscape of SSM for 60 cancer projects in ICGC. In addition, the whole genome sequencing (WGS) and whole exome sequencing (WXS) techniques were compared according to the landscape of SSM. The result indicates numbers of SSM vary dramatically in different cancers. WGS can detect 10 times more single base substitutions and insertions than WXS. In terms of WXS, it called 10 times more deletions than insertions. Multiple base substitutions have not been well studied so they were just observed in a few cancer projects. Cancers generally show high prevalence of C > T substitutions at NpCpG trinucleotide contexts. Skin cancer showed distinct mutational spectra. Breast cancer, bladder cancer, and cervical cancer were found to have similar mutational spectra. Acute myeloid leukemia and lung cancer from South Korea, and colorectal cancer from China show high density of single base substitutions per mega base in chromosome Y. To sum up, our study and findings will be thought provoking in studying SSM in cancer.

Download Full-text

Pan-cancer analysis of somatic mutations and transcriptomes reveals common functional gene clusters shared by multiple cancer types

Scientific Reports ◽

10.1038/s41598-018-24379-y ◽

2018 ◽

Vol 8 (1) ◽

Cited By ~ 11

Author(s):

Hyeongmin Kim ◽

Yong-Min Kim

Keyword(s):

Somatic Mutations ◽

Gene Clusters ◽

Functional Gene ◽

Multiple Cancer ◽

Cancer Types ◽

Pan Cancer

Download Full-text

Scanning window analysis of non-coding regions within normal-tumor whole-genome sequence samples

Briefings in Bioinformatics ◽

10.1093/bib/bbaa203 ◽

2020 ◽

Author(s):

J P Torcivia ◽

R Mazumder

Keyword(s):

Colon Adenocarcinoma ◽

Whole Genome Sequence ◽

Whole Genome ◽

Coding Region ◽

Coding Regions ◽

Functional Aspects ◽

Scanning Window ◽

Cancer Types ◽

Carcinoma Colon ◽

Pan Cancer

Abstract Genomics has benefited from an explosion in affordable high-throughput technology for whole-genome sequencing. The regulatory and functional aspects in non-coding regions may be an important contributor to oncogenesis. Whole-genome tumor-normal paired alignments were used to examine the non-coding regions in five cancer types and two races. Both a sliding window and a binning strategy were introduced to uncover areas of higher than expected variation for additional study. We show that the majority of cancer associated mutations in 154 whole-genome sequences covering breast invasive carcinoma, colon adenocarcinoma, kidney renal papillary cell carcinoma, lung adenocarcinoma and uterine corpus endometrial carcinoma cancers and two races are found outside of the coding region (4 432 885 in non-gene regions versus 1 412 731 in gene regions). A pan-cancer analysis found significantly mutated windows (292 to 3881 in count) demonstrating that there are significant numbers of large mutated regions in the non-coding genome. The 59 significantly mutated windows were found in all studied races and cancers. These offer 16 regions ripe for additional study within 12 different chromosomes—2, 4, 5, 7, 10, 11, 16, 18, 20, 21 and X. Many of these regions were found in centromeric locations. The X chromosome had the largest set of universal windows that cluster almost exclusively in Xq11.1—an area linked to chromosomal instability and oncogenesis. Large consecutive clusters (super windows) were found (19 to 114 in count) providing further evidence that large mutated regions in the genome are influencing cancer development. We show remarkable similarity in highly mutated non-coding regions across both cancer and race.

Download Full-text

CNCDatabase: a database of non-coding cancer drivers

Nucleic Acids Research ◽

10.1093/nar/gkaa915 ◽

2020 ◽

Vol 49 (D1) ◽

pp. D1094-D1101

Author(s):

Eric Minwei Liu ◽

Alexander Martinez-Fundichely ◽

Rajesh Bollapragada ◽

Maurice Spiewack ◽

Ekta Khurana

Keyword(s):

Tumor Development ◽

Luciferase Reporter ◽

Whole Genome ◽

Gene Promoters ◽

Genome Sequences ◽

Coding Regions ◽

Cancer Types ◽

Cancer Drivers ◽

Non Coding Rnas ◽

Experimental Validations

Abstract Most mutations in cancer genomes occur in the non-coding regions with unknown impact on tumor development. Although the increase in the number of cancer whole-genome sequences has revealed numerous putative non-coding cancer drivers, their information is dispersed across multiple studies making it difficult to understand their roles in tumorigenesis of different cancer types. We have developed CNCDatabase, Cornell Non-coding Cancer driver Database (https://cncdatabase.med.cornell.edu/) that contains detailed information about predicted non-coding drivers at gene promoters, 5′ and 3′ UTRs (untranslated regions), enhancers, CTCF insulators and non-coding RNAs. CNCDatabase documents 1111 protein-coding genes and 90 non-coding RNAs with reported drivers in their non-coding regions from 32 cancer types by computational predictions of positive selection using whole-genome sequences; differential gene expression in samples with and without mutations; or another set of experimental validations including luciferase reporter assays and genome editing. The database can be easily modified and scaled as lists of non-coding drivers are revised in the community with larger whole-genome sequencing studies, CRISPR screens and further experimental validations. Overall, CNCDatabase provides a helpful resource for researchers to explore the pathological role of non-coding alterations in human cancers.

Download Full-text

Chromoanagenesis landscape in 10,000 TCGA patients

10.1101/2021.04.29.441937 ◽

2021 ◽

Author(s):

Roni Rasnic ◽

Michal Linial

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Learning Algorithm ◽

The Cancer Genome Atlas ◽

Cancer Type ◽

Whole Genome ◽

Cancer Genome Atlas ◽

Cancer Types ◽

Tumor Biopsies ◽

Pan Cancer

During the past decade, whole-genome sequencing of tumor biopsies and individuals with congenital disorders highlighted the phenomenon of chromoanagenesis, a single chaotic event of chromosomal rearrangement. Chromoanagenesis was shown to be frequent in many types of cancers, to occur in early stages of cancer development, and significantly impact the tumors nature. However, an in-depth, cancer-type dependent analysis has been somewhat incomplete due to the shortage in whole genome sequencing of cancerous samples. In this study, we extracted data from The Pan-Cancer Analysis of Whole Genome (PCAWG) and The Cancer Genome Atlas (TCGA) to construct a machine learning algorithm that can detect chromoanagenesis with high accuracy (86%). The algorithm was applied to ~10,000 TCGA cancer patients. We utilize the chromoanagenesis assignment results, to analyze cancer-type specific chromoanagenesis characteristics in 20 TCGA cancer types. Our results unveil prominent genes affected in either chromoanagenesis or non-chromoanagenesis tumorigenesis. The analysis reveals a mutual exclusivity relationship between the genes impaired in chromoanagenesis versus non-chromoanagenesis cases. We offer the discovered characteristics as possible targets for cancer diagnostic and therapeutic purposes.

Download Full-text