Identifying Candidate Cancer Genes Based on Their Somatic Mutations Co-Occurring with Cancer Genes in Cancer Genome Profiling

Unique k-mer sequences for validating cancer-related substitution, insertion and deletion mutations

NAR Cancer ◽

10.1093/narcan/zcaa034 ◽

2020 ◽

Vol 2 (4) ◽

Author(s):

HoJoon Lee ◽

Ahmed Shuaibi ◽

John M Bell ◽

Dmitri S Pavlichin ◽

Hanlee P Ji

Keyword(s):

Genome Sequencing ◽

Somatic Mutations ◽

Variant Calling ◽

Cancer Genome ◽

Cancer Genes ◽

Sequencing Data ◽

Deletion Mutations ◽

Insertion And Deletion ◽

Cancer Genome Sequencing ◽

Significant Difference

Abstract Cancer genome sequencing has led to important discoveries such as the identification of cancer genes. However, challenges remain in the analysis of cancer genome sequencing. One significant issue is that mutations identified by multiple variant callers are frequently discordant even when using the same genome sequencing data. For insertion and deletion mutations, oftentimes there is no agreement among different callers. Identifying somatic mutations involves read mapping and variant calling, a complicated process that uses many parameters and model tuning. To validate the identification of true mutations, we developed a method using k-mer sequences. First, we characterized the landscape of unique versus non-unique k-mers in the human genome. Second, we developed a software package, KmerVC, to validate the given somatic mutations from sequencing data. Our program validates the occurrence of a mutation based on statistically significant difference in frequency of k-mers with and without a mutation from matched normal and tumor sequences. Third, we tested our method on both simulated and cancer genome sequencing data. Counting k-mer involving mutations effectively validated true positive mutations including insertions and deletions across different individual samples in a reproducible manner. Thus, we demonstrated a straightforward approach for rapidly validating mutations from cancer genome sequencing data.

Download Full-text

HiCancer: accurate and complete cancer genome phasing with Hi-C reads

Scientific Reports ◽

10.1038/s41598-021-86104-6 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Weihua Pan ◽

Desheng Gong ◽

Da Sun ◽

Haohui Luo

Keyword(s):

Copy Number ◽

Somatic Mutations ◽

Copy Number Variations ◽

Cancer Genome ◽

Structural Variations ◽

Genome Map ◽

Linkage Information ◽

Suboptimal Solution ◽

Allelic Copy Number ◽

Very High

AbstractDue to the high complexity of cancer genome, it is too difficult to generate complete cancer genome map which contains the sequence of every DNA molecule until now. Nevertheless, phasing each chromosome in cancer genome into two haplotypes according to germline mutations provides a suboptimal solution to understand cancer genome. However, phasing cancer genome is also a challenging problem, due to the limit in experimental and computational technologies. Hi-C data is widely used in phasing in recent years due to its long-range linkage information and provides an opportunity for solving the problem of phasing cancer genome. The existing Hi-C based phasing methods can not be applied to cancer genome directly, because the somatic mutations in cancer genome such as somatic SNPs, copy number variations and structural variations greatly reduce the correctness and completeness. Here, we propose a new Hi-C based pipeline for phasing cancer genome called HiCancer. HiCancer solves different kinds of somatic mutations and variations, and take advantage of allelic copy number imbalance and linkage disequilibrium to improve the correctness and completeness of phasing. According to our experiments in K562 and KBM-7 cell lines, HiCancer is able to generate very high-quality chromosome-level haplotypes for cancer genome with only Hi-C data.

Download Full-text

A Statistical Framework for Evolutionary Analysis of Recurrent Somatic Mutations in Cancers

10.1101/2020.04.10.036095 ◽

2020 ◽

Author(s):

Xun Gu

Keyword(s):

Somatic Mutations ◽

Cancer Genomics ◽

Computational Procedure ◽

The Cancer Genome Atlas ◽

Component Model ◽

Evolutionary Analysis ◽

Cancer Genes ◽

Component Mixture ◽

Two Component ◽

Empirical Bayesian Method

AbstractCurrent cancer genomics databases have accumulated millions of somatic mutations that remain to be further explored, faciltating enormous high throuput analyses to explore the underlying mechanisms that may contribute to malignant initiation or progression. In the context of over-dominant passenger mutations (unrelated to cancers), the challenge is to identify somatic mutations that are cancer-driving. Under the notion that carcinogenesis is a form of somatic-cell evolution, we developed a two-component mixture model that enables to accomplish the following analyses. (i) We formulated a quasi-likelihood approach to test whether the two-component model is significantly better than a single-component model, which can be used for new cancer gene predicting. (ii) We implemented an empirical Bayesian method to calculate the posterior probabilities of a site to be cancer-driving for all sites of a gene, which can be used for new driving site predicting. (iii) We developed a computational procedure to calculate the somatic selection intensity at driver sites and passenger sites, respectively, as well as site-specific profiles for all sites. Using these newly-developed methods, we comprehensively analyzed 294 known cancer genes based on The Cancer Genome Atlas (TCGA) database.

Download Full-text

Discovering the drivers of clonal hematopoiesis

10.1101/2020.10.22.350140 ◽

2020 ◽

Cited By ~ 1

Author(s):

Oriol Pich ◽

Iker Reyes-Salazar ◽

Abel Gonzalez-Perez ◽

Nuria Lopez-Bigas

Keyword(s):

Positive Selection ◽

Molecular Mechanisms ◽

Somatic Mutations ◽

Cancer Genomics ◽

Variant Calling ◽

Selective Advantage ◽

Cancer Genes ◽

Driver Genes ◽

Hematopoietic Stem ◽

Clonal Hematopoiesis

AbstractMutations in genes that confer a selective advantage to hematopoietic stem cells (HSCs) in certain conditions drive clonal hematopoiesis (CH). While some CH drivers have been identified experimentally or through epidemiological studies, the compendium of all genes able to drive CH upon mutations in HSCs is far from complete. We propose that identifying signals of positive selection in blood somatic mutations may be an effective way to identify CH driver genes, similarly as done to identify cancer genes. Using a reverse somatic variant calling approach, we repurposed whole-genome and whole-exome blood/tumor paired samples of more than 12,000 donors from two large cancer genomics cohorts to identify blood somatic mutations. The application of IntOGen, a robust driver discovery pipeline, to blood somatic mutations across both cohorts, and more than 24,000 targeted sequenced samples yielded a list of close to 70 genes with signals of positive selection in CH, available at http://www.intogen.org/ch. This approach recovers all known CH genes, and discovers novel candidates. Generating this compendium is an essential step to understand the molecular mechanisms of CH and to accurately detect individuals with CH to ascertain their risk to develop related diseases.

Download Full-text

OncoVar: an integrated database and analysis platform for oncogenic driver variants in cancers

Nucleic Acids Research ◽

10.1093/nar/gkaa1033 ◽

2020 ◽

Vol 49 (D1) ◽

pp. D1289-D1301 ◽

Cited By ~ 2

Author(s):

Tao Wang ◽

Shasha Ruan ◽

Xiaolu Zhao ◽

Xiaohui Shi ◽

Huajing Teng ◽

...

Keyword(s):

Cancer Genome ◽

The Cancer Genome Atlas ◽

Driver Mutations ◽

Cancer Genes ◽

Driver Genes ◽

Cancer Driver ◽

Cancer Cell Population ◽

Cancer Types ◽

Neutral Mutations ◽

Analysis Platform

Abstract The prevalence of neutral mutations in cancer cell population impedes the distinguishing of cancer-causing driver mutations from passenger mutations. To systematically prioritize the oncogenic ability of somatic mutations and cancer genes, we constructed a useful platform, OncoVar (https://oncovar.org/), which employed published bioinformatics algorithms and incorporated known driver events to identify driver mutations and driver genes. We identified 20 162 cancer driver mutations, 814 driver genes and 2360 pathogenic pathways with high-confidence by reanalyzing 10 769 exomes from 33 cancer types in The Cancer Genome Atlas (TCGA) and 1942 genomes from 18 cancer types in International Cancer Genome Consortium (ICGC). OncoVar provides four points of view, ‘Mutation’, ‘Gene’, ‘Pathway’ and ‘Cancer’, to help researchers to visualize the relationships between cancers and driver variants. Importantly, identification of actionable driver alterations provides promising druggable targets and repurposing opportunities of combinational therapies. OncoVar provides a user-friendly interface for browsing, searching and downloading somatic driver mutations, driver genes and pathogenic pathways in various cancer types. This platform will facilitate the identification of cancer drivers across individual cancer cohorts and helps to rank mutations or genes for better decision-making among clinical oncologists, cancer researchers and the broad scientific community interested in cancer precision medicine.

Download Full-text

OncodriveCLUST: exploiting the positional clustering of somatic mutations to identify cancer genes

Bioinformatics ◽

10.1093/bioinformatics/btt395 ◽

2013 ◽

Vol 29 (18) ◽

pp. 2238-2244 ◽

Cited By ~ 230

Author(s):

David Tamborero ◽

Abel Gonzalez-Perez ◽

Nuria Lopez-Bigas

Keyword(s):

Somatic Mutations ◽

Cancer Genes

Download Full-text

Characterising Somatic Mutations in Cancer Genome by Means of Next-generation Sequencing

Encyclopedia of Life Sciences ◽

10.1002/9780470015902.a0023379 ◽

2012 ◽

Author(s):

Mei Ling Chong ◽

Chee Seng Ku ◽

Mengchu Wu ◽

Richie Soong

Keyword(s):

Next Generation Sequencing ◽

Somatic Mutations ◽

Cancer Genome ◽

Next Generation ◽

Generation Sequencing

Download Full-text

Quantifying gene selection in cancer through protein functional alteration bias

Nucleic Acids Research ◽

10.1093/nar/gkz546 ◽

2019 ◽

Vol 47 (13) ◽

pp. 6642-6655 ◽

Cited By ~ 7

Author(s):

Nadav Brandes ◽

Nathan Linial ◽

Michal Linial

Keyword(s):

Somatic Mutations ◽

Gene Selection ◽

De Novo ◽

Cancer Genes ◽

Driver Genes ◽

Protein Coding ◽

Protein Coding Genes ◽

Machine Learning Model ◽

Implicit And Explicit ◽

False Discoveries

Abstract Compiling the catalogue of genes actively involved in cancer is an ongoing endeavor, with profound implications to the understanding and treatment of the disease. An abundance of computational methods have been developed to screening the genome for candidate driver genes based on genomic data of somatic mutations in tumors. Existing methods make many implicit and explicit assumptions about the distribution of random mutations. We present FABRIC, a new framework for quantifying the selection of genes in cancer by assessing the effects of de-novo somatic mutations on protein-coding genes. Using a machine-learning model, we quantified the functional effects of ∼3M somatic mutations extracted from over 10 000 human cancerous samples, and compared them against the effects of all possible single-nucleotide mutations in the coding human genome. We detected 593 protein-coding genes showing statistically significant bias towards harmful mutations. These genes, discovered without any prior knowledge, show an overwhelming overlap with known cancer genes, but also include many overlooked genes. FABRIC is designed to avoid false discoveries by comparing each gene to its own background model using rigorous statistics, making minimal assumptions about the distribution of random somatic mutations. The framework is an open-source project with a simple command-line interface.

Download Full-text

Evaluation of Commercial Circulating Tumor DNA Test in Metastatic Prostate Cancer

JCO Precision Oncology ◽

10.1200/po.19.00014 ◽

2019 ◽

pp. 1-9 ◽

Cited By ~ 3

Author(s):

Sinja Taavitsainen ◽

Matti Annala ◽

Elisa Ledet ◽

Kevin Beja ◽

Patrick J. Miller ◽

...

Keyword(s):

Prostate Cancer ◽

Metastatic Prostate Cancer ◽

Somatic Mutations ◽

Circulating Tumor Dna ◽

Research Approach ◽

Cancer Genes ◽

Dna Repair Gene ◽

Clinical Scenarios ◽

Tumor Dna ◽

Allele Fraction

PURPOSE Circulating tumor DNA (ctDNA) sequencing provides a minimally invasive method for tumor molecular stratification. Commercial ctDNA sequencing is increasingly used in the clinic, but its accuracy in metastatic prostate cancer is untested. We compared the commercial Guardant360 ctDNA test against an academic sequencing approach for profiling metastatic prostate cancer. PATIENTS AND METHODS Plasma cell-free DNA was collected between September 2016 and April 2018 from 24 patients with clinically progressive metastatic prostate cancer representing a range of clinical scenarios. Each sample was analyzed using Guardant360 and a research panel encompassing 73 prostate cancer genes. Concordance of somatic mutation and copy number calls was evaluated between the two approaches. RESULTS Targeted sequencing independently confirmed 94% of somatic mutations identified by Guardant360 at an allele fraction greater than 1%. AR amplifications and mutations were detected with high concordance in 14 patients, with only three discordant subclonal mutations at an allele fraction lower than 0.5%. Many somatic mutations identified by Guardant360 at an allele fraction lower than 1% seemed to represent subclonal passenger events or non–prostate-derived clones. Most of the non- AR gene amplifications reported by Guardant360 represented single copy gains. The research approach detected several clinically relevant DNA repair gene alterations not reported by Guardant360, including four germline truncating BRCA2/ ATM mutations, two somatic ATM stop gain mutations, one BRCA2 biallelic deletion, 11 BRCA2 stop gain reversal mutations in a patient treated with olaparib, and a hypermutator phenotype in a patient sample with 42 mutations per megabase. CONCLUSION Guardant360 accurately identifies somatic ctDNA mutations in patients with metastatic prostate cancer, but low allele frequency mutations should be interpreted with caution. Test utility in metastatic prostate cancer is currently limited by the lack of reporting on actionable deletions, rearrangements, and germline mutations.

Download Full-text

ENDOCRINE TUMOURS: Advances in the molecular pathogenesis of thyroid cancer: lessons from the cancer genome

Acta Endocrinologica ◽

10.1530/eje-16-0202 ◽

2016 ◽

Vol 175 (5) ◽

pp. R203-R217 ◽

Cited By ~ 25

Author(s):

Garcilaso Riesco-Eizaguirre ◽

Pilar Santisteban

Keyword(s):

Thyroid Cancer ◽

Cancer Genomics ◽

Treatment Strategies ◽

Genetic Alterations ◽

Cancer Genome ◽

Molecular Pathogenesis ◽

Cancer Genes ◽

New Information ◽

Novel Treatment Strategies ◽

And Behavior

Thyroid cancer is the most common endocrine malignancy giving rise to one of the most indolent solid cancers, but also one of the most lethal. In recent years, systematic studies of the cancer genome, most importantly those derived from The Cancer Genome Altas (TCGA), have catalogued aberrations in the DNA, chromatin, and RNA of the genomes of thousands of tumors relative to matched normal cellular genomes and have analyzed their epigenetic and protein consequences. Cancer genomics is therefore providing new information on cancer development and behavior, as well as new insights into genetic alterations and molecular pathways. From this genomic perspective, we will review the main advances concerning some essential aspects of the molecular pathogenesis of thyroid cancer such as mutational mechanisms, new cancer genes implicated in tumor initiation and progression, the role of non-coding RNA, and the advent of new susceptibility genes in thyroid cancer predisposition. This look across these genomic and cellular alterations results in the reshaping of the multistep development of thyroid tumors and offers new tools and opportunities for further research and clinical development of novel treatment strategies.

Download Full-text