FERMI: A novel method for sensitive detection of rare mutations in somatic tissue

Mapping Intimacies ◽

10.1101/208066 ◽

2017 ◽

Cited By ~ 2

Author(s):

L. Alexander Liggett ◽

Anchal Sharma ◽

Subhajyoti De ◽

James DeGregori

Keyword(s):

Residual Disease ◽

Clonal Evolution ◽

Null Model ◽

Single Copy ◽

Error Rates ◽

Cancer Evolution ◽

Sequencing Data ◽

Bone Marrow Biopsies ◽

Rare Mutations ◽

Novel Method

AbstractWith growing interest in monitoring mutational processes in normal tissues, tumor heterogeneity, and cancer evolution under therapy, the ability to accurately and economically detect ultra-rare mutations is becoming increasingly important. However, this capability has often been compromised by significant sequencing, PCR and DNA preparation error rates. Here, we describe FERMI (Fast Extremely Rare Mutation Identification) - a novel method designed to eliminate majority of these sequencing and library preparation errors in order to significantly improve rare somatic mutation detection. This method leverages barcoded targeting probes to capture and sequence DNA of interest with single copy resolution. The variant calls from the barcoded sequencing data then further filtered in a position-dependent fashion against an adaptive, context-aware null model in order to distinguish true variants. As a proof of principle, we employ FERMI to probe bone marrow biopsies from leukemia patients, and show that rare mutations and clonal evolution can be tracked throughout cancer treatment, including during historically intractable periods like minimum residual disease. Importantly, FERMI is able to accurately detect nascent clonal expansions within leukemias in a manner that may facilitate the early detection and characterization of cancer relapse.

Download Full-text

Opposing Evolutionary Pressures Drive Clonal Evolution and Health Outcomes in the Aging Blood System

Blood ◽

10.1182/blood-2020-142086 ◽

2020 ◽

Vol 136 (Supplement 1) ◽

pp. 37-37

Author(s):

Kimberly Skead ◽

Armande Ang Houle ◽

Sagi Abelson ◽

Marie-Julie Fave ◽

Boxi Lin ◽

...

Keyword(s):

Gene Disruption ◽

Large Scale ◽

Evolutionary Dynamics ◽

Clonal Evolution ◽

Neutral Evolution ◽

Driver Mutations ◽

Cancer Evolution ◽

Sequencing Data ◽

High Coverage ◽

Passenger Mutations

The age-associated accumulation of somatic mutations and large-scale structural variants (SVs) in the early hematopoietic hierarchy have been linked to premalignant stages for cancer and cardiovascular disease (CVD). However, only a small proportion of individuals harboring these mutations progress to disease, and mechanisms driving the transformation to malignancy remains unclear. Hematopoietic evolution, and cancer evolution more broadly, has largely been studied through a lens of adaptive evolution and the contribution of functionally neutral or mildly damaging mutations to early disease-associated clonal expansions has not been well characterised despite comprising the majority of the mutational burden in healthy or tumoural tissues. Through combining deep learning with population genetics, we interrogate the hematopoietic system to capture signatures of selection acting in healthy and pre-cancerous blood populations. Here, we leverage high-coverage sequencing data from healthy and pre-cancerous individuals from the European Prospective Investigation into Cancer and Nutrition Study (n=477) and dense genotyping from the Canadian Partnership for Tomorrow's Health (n=5,000) to show that blood rejects the paradigm of strictly adaptive or neutral evolution and is subject to pervasive negative selection. We observe clear age associations across hematopoietic populations and the dominant class of selection driving evolutionary dynamics acting at an individual level. We find that both the location and ratio of passenger to driver mutations are critical in determining if positive selection acting on driver mutations is able to overwhelm regulated hematopoiesis and allow clones harbouring disease-predisposing mutations to rise to dominance. Certain genes are enriched for passenger mutations in healthy individuals fitting purifying models of evolution, suggesting that the presence of passenger mutations in a subset of genes might confer a protective role against disease-predisposing clonal expansions. Finally, we find that the density of gene disruption events with known pathogenic associations in somatic SVs impacts the frequency at which the SV segregates in the population with variants displaying higher gene disruption density segregating at lower frequencies. Understanding how blood evolves towards malignancy will allow us to capture cancer in its earliest stages and identify events initiating departures from healthy blood evolution. Further, as the majority of mutations are passengers, studying their contribution to tumorigenesis, will unveil novel therapeutic targets thus enabling us to better understand patterns of clonal evolution in order to diagnose and treat disease in its infancy. Disclosures Dick: Bristol-Myers Squibb/Celgene: Research Funding.

Download Full-text

Tumor subclonal progression model for cancer hallmark acquisition

10.1101/149252 ◽

2017 ◽

Author(s):

Yusuke Matsui ◽

Satoru Miyano ◽

Teppei Shimamura

Keyword(s):

Large Scale ◽

Clear Cell ◽

Genomic Data ◽

Evolutionary Tree ◽

Cancer Evolution ◽

Sequencing Data ◽

Progression Model ◽

Important Challenge ◽

Evolutionary Trajectories ◽

Novel Method

AbstractRecent advances in the methods for reconstruction of cancer evolutionary trajectories opened up the prospects of deciphering the subclonal populations and their evolutionary architectures within cancer ecosystems. An important challenge of the cancer evolution studies is how to connect genetic aberrations in subclones to a clinically interpretable and actionable target in the subclones for individual patients. In this study, our aim is to develop a novel method for constructing a model of tumor subclonal progression in terms of cancer hallmark acquisition using multiregional sequencing data. We prepare a subclonal evolutionary tree inferred from variant allele frequencies and estimate pathway alteration probabilities from large-scale cohort genomic data. We then construct an evolutionary tree of pathway alterations that takes into account selectivity of pathway alterations via selectivity score. We show the effectiveness of our method on a dataset of clear cell renal cell carcinomas.

Download Full-text

MERIT: a Mutation Error Rate Identification Toolkit for Ultra-deep Sequencing Applications

10.1101/184291 ◽

2017 ◽

Cited By ~ 1

Author(s):

Mohammad Hadigol ◽

Hossein Khiabanian

Keyword(s):

Deep Sequencing ◽

High Throughput Sequencing ◽

Clonal Evolution ◽

False Negative ◽

Error Rates ◽

Sequencing Data ◽

Genomic Context ◽

Nucleotide Incorporation ◽

Double Base

AbstractRapid progress in high-throughput sequencing (HTS) has enabled the molecular characterization of mutational landscapes in heterogeneous populations and has improved our understanding of clonal evolution processes. Analyzing the sensitivity of detecting genomic mutations in HTS requires comprehensive profiling of sequencing artifacts. To this end, we introduce MERIT, designed for in-depth quantification of erroneous substitutions and small insertions and deletions, specifically for ultra-deep applications. MERIT incorporates an all-inclusive variant caller and considers genomic context, including the nucleotides immediately at 5′ and 3′, thereby establishing error rates for 96 possible substitutions as well as four singlebase and 16 double-base indels. We apply MERIT to ultra-deep sequencing data (1,300,000×) and show a significant relationship between error rates and genomic contexts. We devise an in silico approach to determine the optimal sequencing depth, where errors occur at rates similar to those of true mutations. Finally, we assess nucleotide-incorporation fidelity of four high-fidelity DNA polymerases in clinically relevant loci, and demonstrate how fixed detection thresholds may result in substantial false positive as well as false negative calls.

Download Full-text

Development of a User-Friendly Pipeline for Mutational Analyses of HIV Using Ultra-Accurate Maximum-Depth Sequencing

Viruses ◽

10.3390/v13071338 ◽

2021 ◽

Vol 13 (7) ◽

pp. 1338

Author(s):

Morgan E. Meissner ◽

Emily J. Julik ◽

Jonathan P. Badalamenti ◽

William G. Arndt ◽

Lauren J. Mills ◽

...

Keyword(s):

Error Rates ◽

Maximum Depth ◽

Sequencing Data ◽

Background Error ◽

High Background ◽

Immunodeficiency Virus ◽

User Friendly ◽

Viral Mutagenesis ◽

Hiv 1

Human immunodeficiency virus type 2 (HIV-2) accumulates fewer mutations during replication than HIV type 1 (HIV-1). Advanced studies of HIV-2 mutagenesis, however, have historically been confounded by high background error rates in traditional next-generation sequencing techniques. In this study, we describe the adaptation of the previously described maximum-depth sequencing (MDS) technique to studies of both HIV-1 and HIV-2 for the ultra-accurate characterization of viral mutagenesis. We also present the development of a user-friendly Galaxy workflow for the bioinformatic analyses of sequencing data generated using the MDS technique, designed to improve replicability and accessibility to molecular virologists. This adapted MDS technique and analysis pipeline were validated by comparisons with previously published analyses of the frequency and spectra of mutations in HIV-1 and HIV-2 and is readily expandable to studies of viral mutation across the genomes of both viruses. Using this novel sequencing pipeline, we observed that the background error rate was reduced 100-fold over standard Illumina error rates, and 10-fold over traditional unique molecular identifier (UMI)-based sequencing. This technical advancement will allow for the exploration of novel and previously unrecognized sources of viral mutagenesis in both HIV-1 and HIV-2, which will expand our understanding of retroviral diversity and evolution.

Download Full-text

Estimating sequencing error rates using families

BioData Mining ◽

10.1186/s13040-021-00259-6 ◽

2021 ◽

Vol 14 (1) ◽

Author(s):

Kelley Paskov ◽

Jae-Yoon Jung ◽

Brianna Chrisman ◽

Nate T. Stockham ◽

Peter Washington ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Exome Sequencing ◽

Genome Sequencing ◽

Variant Calling ◽

Error Rates ◽

Sequencing Error ◽

Whole Genome ◽

Sequencing Data ◽

Sequencing Platform ◽

Whole Exome

Abstract Background As next-generation sequencing technologies make their way into the clinic, knowledge of their error rates is essential if they are to be used to guide patient care. However, sequencing platforms and variant-calling pipelines are continuously evolving, making it difficult to accurately quantify error rates for the particular combination of assay and software parameters used on each sample. Family data provide a unique opportunity for estimating sequencing error rates since it allows us to observe a fraction of sequencing errors as Mendelian errors in the family, which we can then use to produce genome-wide error estimates for each sample. Results We introduce a method that uses Mendelian errors in sequencing data to make highly granular per-sample estimates of precision and recall for any set of variant calls, regardless of sequencing platform or calling methodology. We validate the accuracy of our estimates using monozygotic twins, and we use a set of monozygotic quadruplets to show that our predictions closely match the consensus method. We demonstrate our method’s versatility by estimating sequencing error rates for whole genome sequencing, whole exome sequencing, and microarray datasets, and we highlight its sensitivity by quantifying performance increases between different versions of the GATK variant-calling pipeline. We then use our method to demonstrate that: 1) Sequencing error rates between samples in the same dataset can vary by over an order of magnitude. 2) Variant calling performance decreases substantially in low-complexity regions of the genome. 3) Variant calling performance in whole exome sequencing data decreases with distance from the nearest target region. 4) Variant calls from lymphoblastoid cell lines can be as accurate as those from whole blood. 5) Whole-genome sequencing can attain microarray-level precision and recall at disease-associated SNV sites. Conclusion Genotype datasets from families are powerful resources that can be used to make fine-grained estimates of sequencing error for any sequencing platform and variant-calling methodology.

Download Full-text

Chromosome-Level Assembly of the Common Lizard (Zootoca vivipara) Genome

Genome Biology and Evolution ◽

10.1093/gbe/evaa161 ◽

2020 ◽

Vol 12 (11) ◽

pp. 1953-1960

Author(s):

Andrey A Yurchenko ◽

Hans Recknagel ◽

Kathryn R Elmer

Keyword(s):

Linkage Map ◽

Single Copy ◽

Phenotypic Traits ◽

Sequencing Data ◽

High Coverage ◽

Squamate Reptiles ◽

Common Lizard ◽

Zootoca Vivipara ◽

The Common ◽

Chromosome Level

Abstract Squamate reptiles exhibit high variation in their phenotypic traits and geographical distributions and are therefore fascinating taxa for evolutionary and ecological research. However, genomic resources are very limited for this group of species, consequently inhibiting research efforts. To address this gap, we assembled a high-quality genome of the common lizard, Zootoca vivipara (Lacertidae), using a combination of high coverage Illumina (shotgun and mate-pair) and PacBio sequencing data, coupled with RNAseq data and genetic linkage map generation. The 1.46-Gb genome assembly has a scaffold N50 of 11.52 Mb with N50 contig size of 220.4 kb and only 2.96% gaps. A BUSCO analysis indicates that 97.7% of the single-copy Tetrapoda orthologs were recovered in the assembly. In total, 19,829 gene models were annotated to the genome using a combination of ab initio and homology-based methods. To improve the chromosome-level assembly, we generated a high-density linkage map from wild-caught families and developed a novel analytical pipeline to accommodate multiple paternity and unknown father genotypes. We successfully anchored and oriented almost 90% of the genome on 19 linkage groups. This annotated and oriented chromosome-level reference genome represents a valuable resource to facilitate evolutionary studies in squamate reptiles.

Download Full-text

The Complete Chloroplast Genome of the Vulnerable Oreocharis esquirolii (Gesneriaceae): Structural Features, Comparative and Phylogenetic Analysis

Plants ◽

10.3390/plants9121692 ◽

2020 ◽

Vol 9 (12) ◽

pp. 1692

Author(s):

Li Gu ◽

Ting Su ◽

Ming-Tai An ◽

Guo-Xiong Hu

Keyword(s):

Phylogenetic Analysis ◽

Sequence Similarity ◽

Single Copy ◽

Structural Features ◽

Rrna Genes ◽

Trna Genes ◽

Sequencing Data ◽

High Sequence Similarity ◽

Plastid Genomes ◽

Cp Genome

Oreocharis esquirolii, a member of Gesneriaceae, is known as Thamnocharis esquirolii, which has been regarded a synonym of the former. The species is endemic to Guizhou, southwestern China, and is evaluated as vulnerable (VU) under the International Union for Conservation of Nature (IUCN) criteria. Until now, the sequence and genome information of O. esquirolii remains unknown. In this study, we assembled and characterized the complete chloroplast (cp) genome of O. esquirolii using Illumina sequencing data for the first time. The total length of the cp genome was 154,069 bp with a typical quadripartite structure consisting of a pair of inverted repeats (IRs) of 25,392 bp separated by a large single copy region (LSC) of 85,156 bp and a small single copy region (SSC) of18,129 bp. The genome comprised 114 unique genes with 80 protein-coding genes, 30 tRNA genes, and four rRNA genes. Thirty-one repeat sequences and 74 simple sequence repeats (SSRs) were identified. Genome alignment across five plastid genomes of Gesneriaceae indicated a high sequence similarity. Four highly variable sites (rps16-trnQ, trnS-trnG, ndhF-rpl32, and ycf 1) were identified. Phylogenetic analysis indicated that O. esquirolii grouped together with O. mileensis, supporting resurrection of the name Oreocharis esquirolii from Thamnocharisesquirolii. The complete cp genome sequence will contribute to further studies in molecular identification, genetic diversity, and phylogeny.

Download Full-text

Measuring evolutionary cancer dynamics from genome sequencing, one patient at a time

Statistical Applications in Genetics and Molecular Biology ◽

10.1515/sagmb-2020-0075 ◽

2020 ◽

Vol 0 (0) ◽

Author(s):

Giulio Caravagna

Keyword(s):

Genome Sequencing ◽

Cancer Evolution ◽

Sequencing Data ◽

Evolutionary Forces ◽

Sequencing Technologies ◽

Cancer Genome Sequencing ◽

Multiple Resolutions ◽

Multiple Patients ◽

Single Tumour ◽

Generation Sequencing

AbstractCancers progress through the accumulation of somatic mutations which accrue during tumour evolution, allowing some cells to proliferate in an uncontrolled fashion. This growth process is intimately related to latent evolutionary forces moulding the genetic and epigenetic composition of tumour subpopulations. Understanding cancer requires therefore the understanding of these selective pressures. The adoption of widespread next-generation sequencing technologies opens up for the possibility of measuring molecular profiles of cancers at multiple resolutions, across one or multiple patients. In this review we discuss how cancer genome sequencing data from a single tumour can be used to understand these evolutionary forces, overviewing mathematical models and inferential methods adopted in field of Cancer Evolution.

Download Full-text

Next-generation antigen receptor sequencing of paired diagnosis and relapse samples of B-cell acute lymphoblastic leukemia: Clonal evolution and implications for minimal residual disease target selection

Leukemia Research ◽

10.1016/j.leukres.2018.10.009 ◽

2019 ◽

Vol 76 ◽

pp. 98-104 ◽

Cited By ~ 8

Author(s):

Prisca M.J. Theunissen ◽

Maaike de Bie ◽

David van Zessen ◽

Valerie de Haas ◽

Andrew P. Stubbs ◽

...

Keyword(s):

Acute Lymphoblastic Leukemia ◽

B Cell ◽

Minimal Residual Disease ◽

Residual Disease ◽

Lymphoblastic Leukemia ◽

Clonal Evolution ◽

Target Selection ◽

Next Generation ◽

Cell Acute Lymphoblastic Leukemia ◽

Minimal Residual

Download Full-text

Multi-cancer analysis of clonality and the timing of systemic spread in paired primary tumors and metastases

10.1101/825240 ◽

2019 ◽

Cited By ~ 1

Author(s):

Zheng Hu ◽

Zan Li ◽

Zhicheng Ma ◽

Christina Curtis

Keyword(s):

Clonal Evolution ◽

Tumor Development ◽

Distant Metastases ◽

Driver Mutations ◽

Sequencing Data ◽

Primary Tumors ◽

Lung Cancer Patients ◽

Systemic Spread ◽

Cancer Spread ◽

Early Tumor

AbstractMetastasis is the primary cause of cancer-related deaths, but the natural history, clonal evolution and impact of treatment are poorly understood. We analyzed exome sequencing data from 457 paired primary tumor and metastatic samples from 136 breast, colorectal and lung cancer patients, including untreated (n=99) and treated (n=100) metastatic tumors. Treated metastases often harbored private ‘driver’ mutations whereas untreated metastases did not, suggesting that treatment promotes clonal evolution. Polyclonal seeding was common in untreated lymph node metastases (n=17/29, 59%) and distant metastases (n=20/70, 29%), but less frequent in treated distant metastases (n=9/94, 10%). The low number of metastasis-private clonal mutations is consistent with early metastatic seeding, which we estimated commonly occurred 2-4 years prior to diagnosis across these cancers. Further, these data suggest that the natural course of metastasis is selectively relaxed relative to early tumor development and that metastasis-private mutations are not drivers of cancer spread but instead associated with drug resistance.

Download Full-text