Scanning window analysis of non-coding regions within normal-tumor whole-genome sequence samples

Briefings in Bioinformatics ◽

10.1093/bib/bbaa203 ◽

2020 ◽

Author(s):

J P Torcivia ◽

R Mazumder

Keyword(s):

Colon Adenocarcinoma ◽

Whole Genome Sequence ◽

Whole Genome ◽

Coding Region ◽

Coding Regions ◽

Functional Aspects ◽

Scanning Window ◽

Cancer Types ◽

Carcinoma Colon ◽

Pan Cancer

Abstract Genomics has benefited from an explosion in affordable high-throughput technology for whole-genome sequencing. The regulatory and functional aspects in non-coding regions may be an important contributor to oncogenesis. Whole-genome tumor-normal paired alignments were used to examine the non-coding regions in five cancer types and two races. Both a sliding window and a binning strategy were introduced to uncover areas of higher than expected variation for additional study. We show that the majority of cancer associated mutations in 154 whole-genome sequences covering breast invasive carcinoma, colon adenocarcinoma, kidney renal papillary cell carcinoma, lung adenocarcinoma and uterine corpus endometrial carcinoma cancers and two races are found outside of the coding region (4 432 885 in non-gene regions versus 1 412 731 in gene regions). A pan-cancer analysis found significantly mutated windows (292 to 3881 in count) demonstrating that there are significant numbers of large mutated regions in the non-coding genome. The 59 significantly mutated windows were found in all studied races and cancers. These offer 16 regions ripe for additional study within 12 different chromosomes—2, 4, 5, 7, 10, 11, 16, 18, 20, 21 and X. Many of these regions were found in centromeric locations. The X chromosome had the largest set of universal windows that cluster almost exclusively in Xq11.1—an area linked to chromosomal instability and oncogenesis. Large consecutive clusters (super windows) were found (19 to 114 in count) providing further evidence that large mutated regions in the genome are influencing cancer development. We show remarkable similarity in highly mutated non-coding regions across both cancer and race.

Download Full-text

Ennet: construction of potential cancer-driving networks based on somatic enhancer mutations only

10.1101/216226 ◽

2017 ◽

Author(s):

Ya Cui ◽

Yiwei Niu ◽

Xueyi Teng ◽

Dan Wang ◽

Huaxia Luo ◽

...

Keyword(s):

Neural Development ◽

Somatic Mutations ◽

Gene Interactions ◽

Whole Genome ◽

Sequencing Technology ◽

Cancer Types ◽

Cancer Drivers ◽

New Perspective ◽

Pan Cancer ◽

Potential Cancer

AbstractWhole genome sequencing technology has facilitated the discovery of a large number of somatic mutations in enhancers (SMEs), whereas the utility of SMEs in tumorigenesis has not been fully explored. Here we present Ennet, a method to comprehensively investigate SMEs enriched networks (SME-networks) in cancer by integrating SMEs, enhancer-gene interactions and gene-gene interactions. Using Ennet, we performed a pan-cancer analysis in 2004 samples from 8 cancer types and found many well-known cancer drivers were involved in the SME-networks, includingESR1,SMAD3,MYC,EGFR,BCL2andPAX5. Meanwhile, Ennet also identified many new networks with less characterization but have potentially important roles in cancer, including a large SME-network in medulloblastoma (MB), which contains genes enriched in the glutamate receptor and neural development pathways. Interestingly, SME-networks are specific across cancer types, and the vast majority of the genes identified by Ennet have few mutations in gene bodies. Collectively, our work suggests that using enhancer-only somatic mutations can be an effective way to discover potential cancer-driving networks. Ennet provides a new perspective to explore new mechanisms for tumor progression from SMEs.

Download Full-text

Reconstructing and characterizing focal amplifications in cancer using AmpliconArchitect

10.1101/457333 ◽

2018 ◽

Cited By ~ 1

Author(s):

Viraj Deshpande ◽

Jens Luebeck ◽

Mehrdad Bakhtiari ◽

Nam-Phuong D Nguyen ◽

Kristen M Turner ◽

...

Keyword(s):

Cervical Cancer ◽

Fine Structure ◽

Tumor Growth ◽

Genome Sequencing ◽

Copy Number ◽

Whole Genome ◽

Multiple Cancer ◽

Cancer Data ◽

Cancer Types ◽

Pan Cancer

AbstractFocal oncogene amplification and rearrangements drive tumor growth and evolution in multiple cancer types. We developed a tool, AmpliconArchitect (AA), which can robustly reconstruct the fine structure of focally amplified regions using whole genome sequencing. AA-reconstructed amplicons in pan-cancer data and in virus-driven cervical cancer samples revealed many novel insights about focal amplifications. Specifically, the findings lend support to extrachromosomally mediated mechanisms for copy number expansion, and oncoviral pathogenesis.

Download Full-text

CNCDatabase: a database of non-coding cancer drivers

10.1101/2020.04.29.069047 ◽

2020 ◽

Author(s):

Eric Minwei Liu ◽

Alexander Martinez-Fundichely ◽

Rajesh Bollapragada ◽

Maurice Spiewack ◽

Ekta Khurana

Keyword(s):

Gene Expression ◽

Luciferase Reporter ◽

Whole Genome ◽

Gene Promoters ◽

Genome Sequences ◽

Coding Regions ◽

Cancer Types ◽

Cancer Drivers ◽

Non Coding Rnas ◽

Experimental Validations

ABSTRACTMost mutations in cancer genomes occur in the non-coding regions with unknown impact to tumor development. Although the increase in number of cancer whole-genome sequences has revealed numerous putative non-coding cancer drivers, their information is dispersed across multiple studies and thus it is difficult to bridge the understanding of non-coding alterations, the genes they impact and the supporting evidence for their role in tumorigenesis across multiple cancer types. To address this gap, we have developed CNCDatabase, Cornell Non-Coding Cancer driver Database (https://cncdatabase.med.cornell.edu/) that contains detailed information about predicted non-coding drivers at gene promoters, 5’ and 3’ UTRs (untranslated regions), enhancers, CTCF insulators and non-coding RNAs. CNCDatabase documents 1,111 protein-coding genes and 90 non-coding RNAs with reported drivers in their non-coding regions from 32 cancer types by computational predictions of positive selection in whole-genome sequences; differential gene expression in samples with and without mutations; or another set of experimental validations including luciferase reporter assays and genome editing. The database can be easily modified and scaled as lists of non-coding drivers are revised in the community with larger whole-genome sequencing studies, CRISPR screens and further experimental validations. Overall, CNCDatabase provides a helpful resource for researchers to explore the pathological role of non-coding alterations and their associations with gene expression in human cancers.

Download Full-text

Chromoanagenesis Landscape in 10,000 TCGA Patients

Cancers ◽

10.3390/cancers13164197 ◽

2021 ◽

Vol 13 (16) ◽

pp. 4197

Author(s):

Roni Rasnic ◽

Michal Linial

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Learning Algorithm ◽

The Cancer Genome Atlas ◽

Cancer Type ◽

Whole Genome ◽

Cancer Genome Atlas ◽

Cancer Types ◽

Tumor Biopsies ◽

Pan Cancer

During the past decade, whole-genome sequencing of tumor biopsies and individuals with congenital disorders highlighted the phenomenon of chromoanagenesis, a single chaotic event of chromosomal rearrangement. Chromoanagenesis was shown to be frequent in many types of cancers, to occur in early stages of cancer development, and significantly impact the tumor’s nature. However, an in-depth, cancer-type dependent analysis has been somewhat incomplete due to the shortage in whole genome sequencing of cancerous samples. In this study, we extracted data from The Pan-Cancer Analysis of Whole Genome (PCAWG) and The Cancer Genome Atlas (TCGA) to construct and test a machine learning algorithm that can detect chromoanagenesis with high accuracy (86%). The algorithm was applied to ~10,000 unlabeled TCGA cancer patients. We utilize the chromoanagenesis assignment results, to analyze cancer-type specific chromoanagenesis characteristics in 20 TCGA cancer types. Our results unveil prominent genes affected in either chromoanagenesis or non-chromoanagenesis tumorigenesis. The analysis reveals a mutual exclusivity relationship between the genes impaired in chromoanagenesis versus non-chromoanagenesis cases. We offer the discovered characteristics as possible targets for cancer diagnostic and therapeutic purposes.

Download Full-text

Near-Complete Genome of SARS-CoV-2 Delta (AY.3) Variant Identified in a Dog in Kansas, USA

Viruses ◽

10.3390/v13102104 ◽

2021 ◽

Vol 13 (10) ◽

pp. 2104

Author(s):

Tyler Doerksen ◽

Andrea Lu ◽

Lance Noll ◽

Kelli Almes ◽

Jianfa Bai ◽

...

Keyword(s):

Genome Sequence ◽

Companion Animals ◽

Whole Genome Sequence ◽

Whole Genome ◽

Rapid Adaptation ◽

Coding Region ◽

Reference Case ◽

The Past ◽

Spike Coding

Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) descriptions of infection and transmission have been increasing in companion animals in the past year. Although canine susceptibility is generally considered low, their role in the COVID-19 disease cycle remains unknown. In this study, we detected and sequenced a delta variant (AY.3) from a 12-year-old Collie living with owners that previously tested positive for SARS-CoV-2. It is unclear if the dogs’ symptoms were related to SARS-CoV-2 infection or underlying conditions. The whole genome sequence obtained from the dog sample had several unique consensus level changes not previously identified in a SARS-CoV-2 genome that may play a role in the rapid adaptation from humans to dogs. Within the spike coding region, 5/7 of the subconsensus variants identified in the dog sequence were also identified in the closest in-house human reference case. Taken together, the whole genome sequence, and phylogenetic and subconsensus variant analyses indicate the virus infecting the animal originated from a local outbreak cluster. The results of these analyses emphasize the importance of rapid detection and characterization of SARS-CoV-2 variants of concern in companion animals.

Download Full-text

CNCDatabase: a database of non-coding cancer drivers

Nucleic Acids Research ◽

10.1093/nar/gkaa915 ◽

2020 ◽

Vol 49 (D1) ◽

pp. D1094-D1101

Author(s):

Eric Minwei Liu ◽

Alexander Martinez-Fundichely ◽

Rajesh Bollapragada ◽

Maurice Spiewack ◽

Ekta Khurana

Keyword(s):

Tumor Development ◽

Luciferase Reporter ◽

Whole Genome ◽

Gene Promoters ◽

Genome Sequences ◽

Coding Regions ◽

Cancer Types ◽

Cancer Drivers ◽

Non Coding Rnas ◽

Experimental Validations

Abstract Most mutations in cancer genomes occur in the non-coding regions with unknown impact on tumor development. Although the increase in the number of cancer whole-genome sequences has revealed numerous putative non-coding cancer drivers, their information is dispersed across multiple studies making it difficult to understand their roles in tumorigenesis of different cancer types. We have developed CNCDatabase, Cornell Non-coding Cancer driver Database (https://cncdatabase.med.cornell.edu/) that contains detailed information about predicted non-coding drivers at gene promoters, 5′ and 3′ UTRs (untranslated regions), enhancers, CTCF insulators and non-coding RNAs. CNCDatabase documents 1111 protein-coding genes and 90 non-coding RNAs with reported drivers in their non-coding regions from 32 cancer types by computational predictions of positive selection using whole-genome sequences; differential gene expression in samples with and without mutations; or another set of experimental validations including luciferase reporter assays and genome editing. The database can be easily modified and scaled as lists of non-coding drivers are revised in the community with larger whole-genome sequencing studies, CRISPR screens and further experimental validations. Overall, CNCDatabase provides a helpful resource for researchers to explore the pathological role of non-coding alterations in human cancers.

Download Full-text

Chromoanagenesis landscape in 10,000 TCGA patients

10.1101/2021.04.29.441937 ◽

2021 ◽

Author(s):

Roni Rasnic ◽

Michal Linial

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Learning Algorithm ◽

The Cancer Genome Atlas ◽

Cancer Type ◽

Whole Genome ◽

Cancer Genome Atlas ◽

Cancer Types ◽

Tumor Biopsies ◽

Pan Cancer

During the past decade, whole-genome sequencing of tumor biopsies and individuals with congenital disorders highlighted the phenomenon of chromoanagenesis, a single chaotic event of chromosomal rearrangement. Chromoanagenesis was shown to be frequent in many types of cancers, to occur in early stages of cancer development, and significantly impact the tumors nature. However, an in-depth, cancer-type dependent analysis has been somewhat incomplete due to the shortage in whole genome sequencing of cancerous samples. In this study, we extracted data from The Pan-Cancer Analysis of Whole Genome (PCAWG) and The Cancer Genome Atlas (TCGA) to construct a machine learning algorithm that can detect chromoanagenesis with high accuracy (86%). The algorithm was applied to ~10,000 TCGA cancer patients. We utilize the chromoanagenesis assignment results, to analyze cancer-type specific chromoanagenesis characteristics in 20 TCGA cancer types. Our results unveil prominent genes affected in either chromoanagenesis or non-chromoanagenesis tumorigenesis. The analysis reveals a mutual exclusivity relationship between the genes impaired in chromoanagenesis versus non-chromoanagenesis cases. We offer the discovered characteristics as possible targets for cancer diagnostic and therapeutic purposes.

Download Full-text

30-OR: Whole Genome Sequence Analysis of Type 2 Diabetes Risk in 44,713 Humans of Diverse Ancestry in the TOPMed Study

Diabetes ◽

10.2337/db19-30-or ◽

2019 ◽

Vol 68 (Supplement 1) ◽

pp. 30-OR

Author(s):

HEATHER M. HIGHLAND ◽

JENNIFER WESSEL ◽

ALISA MANNING ◽

Keyword(s):

Type 2 Diabetes ◽

Sequence Analysis ◽

Genome Sequence ◽

Diabetes Risk ◽

Whole Genome Sequence ◽

Whole Genome ◽

Genome Sequence Analysis

Download Full-text

Whole genome sequence comparison of endemic multi-resistant Escherichia coli clones

10.26226/morressier.56d5ba26d462b80296c94b2c ◽

2016 ◽

Author(s):

Tove Havnhøj Frandsen

Keyword(s):

Escherichia Coli ◽

Genome Sequence ◽

Sequence Comparison ◽

Whole Genome Sequence ◽

Whole Genome ◽

Genome Sequence Comparison

Download Full-text

Faculty Opinions recommendation of Optimal algorithms for haplotype assembly from whole-genome sequence data.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.13339986.14707085 ◽

2011 ◽

Author(s):

Alejandro Schaffer

Keyword(s):

Genome Sequence ◽

Sequence Data ◽

Whole Genome Sequence ◽

Whole Genome ◽

Optimal Algorithms ◽

Genome Sequence Data ◽

Haplotype Assembly

Download Full-text