Accurate quantification of copy-number aberrations and whole-genome duplications in multi-sample tumor sequencing data

Mapping Intimacies ◽

10.1101/496174 ◽

2018 ◽

Cited By ~ 11

Author(s):

Simone Zaccaria ◽

Benjamin J. Raphael

Keyword(s):

Dna Sequencing ◽

Primary Tumor ◽

Copy Number ◽

Tumor Evolution ◽

Sequencing Data ◽

Multiple Tumor ◽

Whole Genome Duplications ◽

Genome Duplications ◽

Allele Specific ◽

Accurate Quantification

Copy-number aberrations (CNAs) and whole-genome duplications (WGDs) are frequent somatic mutations in cancer. Accurate quantification of these mutations from DNA sequencing of bulk tumor samples is complicated by varying tumor purity, admixture of multiple tumor clones with distinct mutations, and high aneuploidy. Standard methods for CNA inference analyze tumor samples individually, but recently DNA sequencing of multiple samples from a cancer patient - e.g. from multiple regions of a primary tumor, matched primary/metastases, or multiple time points - has become common. We introduce a new algorithm, Holistic Allele-specific Tumor Copy-number Heterogeneity (HATCHet), that infers allele and clone-specific CNAs and WGDs jointly across multiple tumor samples from the same patient, and that leverages the relationships between clones in these samples. HATCHet provides a fresh perspective on CNA inference and includes several algorithmic innovations that overcome the limitations of existing methods, resulting in a more robust approach even for single-sample analysis. We also develop MASCoTE (Multiple Allele-specific Simulation of Copy-number Tumor Evolution), a framework for generating realistic simulated multi-sample DNA sequencing data with appropriate corrections for the differences in genome lengths between the normal and tumor clone(s) present in mixed samples. HATCHet outperforms current state-of-the-art methods on 256 simulated tumor samples from 64 patients, half with WGD. HATCHet's analysis of 49 primary tumor and metastasis samples from 10 prostate cancer patients reveals subclonal CNAs in only 29 of these samples, compared to the published reports of extensive subclonal CNAs in all samples. HATCHet's inferred CNAs are also more consistent with the reports of polyclonal origin and limited heterogeneity of metastasis in a subset of patients. HATCHet's analysis of 35 primary tumor and metastasis samples from 4 pancreas cancer patients reveals subclonal CNAs in 20 samples, WGDs in 3 patients, and tumor subclones that are shared across primary and metastases samples from the same patient - none of which were described in published analysis of this data. HATCHet substantially improves the analysis of CNAs and WGDs, leading to more reliable studies of tumor evolution in primary tumors and metastases.

Download Full-text

Accurate quantification of copy-number aberrations and whole-genome duplications in multi-sample tumor sequencing data

Nature Communications ◽

10.1038/s41467-020-17967-y ◽

2020 ◽

Vol 11 (1) ◽

Author(s):

Simone Zaccaria ◽

Benjamin J. Raphael

Keyword(s):

Copy Number ◽

Whole Genome ◽

Sequencing Data ◽

Copy Number Aberrations ◽

Whole Genome Duplications ◽

Genome Duplications ◽

Tumor Sequencing ◽

Accurate Quantification

Download Full-text

Remarkably stable copy-number profiles in osteosarcoma revealed using single-cell DNA sequencing

10.1101/2021.08.30.458268 ◽

2021 ◽

Author(s):

Sanjana Rajan ◽

Simone Zaccaria ◽

Matthew V. Cannon ◽

Maren Cam ◽

Amy C. Gross ◽

...

Keyword(s):

Dna Sequencing ◽

Single Cell ◽

Genomic Instability ◽

Copy Number ◽

Fitness Landscape ◽

Clonal Evolution ◽

Response To Therapy ◽

Tumor Evolution ◽

Sequencing Data ◽

Recurrent Mutations

AbstractOsteosarcoma is an aggressive malignancy characterized by high genomic complexity. Identification of few recurrent mutations in protein coding genes suggests that somatic copy-number aberrations (SCNAs) are the genetic drivers of disease. Models around genomic instability conflict-it is unclear if osteosarcomas result from pervasive ongoing clonal evolution with continuous optimization of the fitness landscape or an early catastrophic event followed by stable maintenance of an abnormal genome. We address this question by investigating SCNAs in 12,019 tumor cells obtained from expanded patient tissues using single-cell DNA sequencing, in ways that were previously impossible with bulk sequencing. Using the CHISEL algorithm, we inferred allele- and haplotype-specific SCNAs from whole-genome single-cell DNA sequencing data. Surprisingly, we found that, despite extensive genomic aberrations, cells within each tumor exhibit remarkably homogeneous SCNA profiles with little sub-clonal diversification. Longitudinal analysis between two pairs of patient samples obtained at distant time points (early detection, relapse) demonstrated remarkable conservation of SCNA profiles over tumor evolution. Phylogenetic analysis suggests that the bulk of SCNAs was acquired early in the oncogenic process, with few new events arising in response to therapy or during adaptation to growth in distant tissues. These data suggest that early catastrophic events, rather than sustained genomic instability, drive formation of these extensively aberrant genomes. Overall, we demonstrate the power of combining single-cell DNA sequencing with an allele- and haplotype-specific SCNA inference algorithm to resolve longstanding questions regarding genetics of tumor initiation and progression, questioning the underlying assumptions of genomic instability inferred from bulk tumor data.

Download Full-text

Bamgineer: Introduction of simulated allele-specific copy number variants into exome and targeted sequence data sets

10.1101/119636 ◽

2017 ◽

Author(s):

Soroush Samadian ◽

Jeff P. Bruce ◽

Trevor J. Pugh

Keyword(s):

Dna Sequencing ◽

Copy Number ◽

Sequence Data ◽

Copy Number Variants ◽

Original Data ◽

Sequencing Data ◽

Data Types ◽

Insert Size ◽

Allele Specific ◽

Proof Of Principle

AbstractSomatic copy number variations (CNVs) play a crucial role in development of many human cancers. The broad availability of next-generation sequencing data has enabled the development of algorithms to computationally infer CNV profiles from a variety of data types including exome and targeted sequence data; currently the most prevalent types of cancer genomics data. However, systemic evaluation and comparison of these tools remains challenging due to a lack of ground truth reference sets. To address this need, we have developed Bamgineer, a tool written in Python to introduce user-defined haplotype-phased allele-specific copy number events into an existing Binary Alignment Mapping (BAM) file, with a focus on targeted and exome sequencing experiments. As input, this tool requires a read alignment file (BAM format), lists of non-overlapping genome coordinates for introduction of gains and losses (bed file), and an optional file defining known haplotypes (vcf format). To improve runtime performance, Bamgineer introduces the desired CNVs in parallel using queuing and parallel processing on a local machine or on a high-performance computing cluster. As proof-of-principle, we applied Bamgineer to a single high-coverage (mean: 220X) exome sequence file from a blood sample to simulate copy number profiles of 3 exemplar tumors from each of 10 tumor types at 5 tumor cellularity levels (20-100%, 150 BAM files in total). To demonstrate feasibility beyond exome data, we introduced read alignments to a targeted 5-gene cell-free DNA sequencing library to simulate EGFR amplifications at frequencies consistent with circulating tumor DNA (10, 1, 0.1 and 0.01%) while retaining the multimodal insert size distribution of the original data. We expect Bamgineer to be of use for development and systematic benchmarking of CNV calling algorithms by users using locally-generated data for a variety of applications. The source code is freely available at http://github.com/pughlab/bamgineer.Author summaryWe present Bamgineer, a software program to introduce user-defined, haplotype-specific copy number variants (CNVs) at any frequency into standard Binary Alignment Mapping (BAM) files. Copy number gains are simulated by introducing new DNA sequencing read pairs sampled from existing reads and modified to contain SNPs of the haplotype of interest. This approach retains biases of the original data such as local coverage, strand bias, and insert size. Deletions are simulated by removing reads corresponding to one or both haplotypes. In our proof-of-principle study, we simulated copy number profiles from 10 cancer types at varying cellularity levels typically encountered in clinical samples. We also demonstrated introduction of low frequency CNVs into cell-free DNA sequencing data that retained the bimodal fragment size distribution characteristic of these data. Bamgineer is flexible and enables users to simulate CNVs that reflect characteristics of locally-generated sequence files and can be used for many applications including development and benchmarking of CNV inference tools for a variety of data types.

Download Full-text

Accucopy: accurate and fast inference of allele-specific copy number alterations from low-coverage low-purity tumor sequencing data

BMC Bioinformatics ◽

10.1186/s12859-020-03924-5 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Xinping Fan ◽

Guanghao Luo ◽

Yu S. Huang

Keyword(s):

Copy Number ◽

Bayesian Learning ◽

Kernel Smoothing ◽

Gaussian Mixture ◽

Copy Number Alterations ◽

Sequencing Data ◽

Copy Numbers ◽

Allele Specific ◽

Tumor Sequencing ◽

Low Coverage

Abstract Background Copy number alterations (CNAs), due to their large impact on the genome, have been an important contributing factor to oncogenesis and metastasis. Detecting genomic alterations from the shallow-sequencing data of a low-purity tumor sample remains a challenging task. Results We introduce Accucopy, a method to infer total copy numbers (TCNs) and allele-specific copy numbers (ASCNs) from challenging low-purity and low-coverage tumor samples. Accucopy adopts many robust statistical techniques such as kernel smoothing of coverage differentiation information to discern signals from noise and combines ideas from time-series analysis and the signal-processing field to derive a range of estimates for the period in a histogram of coverage differentiation information. Statistical learning models such as the tiered Gaussian mixture model, the expectation–maximization algorithm, and sparse Bayesian learning were customized and built into the model. Accucopy is implemented in C++ /Rust, packaged in a docker image, and supports non-human samples, more at http://www.yfish.org/software/. Conclusions We describe Accucopy, a method that can predict both TCNs and ASCNs from low-coverage low-purity tumor sequencing data. Through comparative analyses in both simulated and real-sequencing samples, we demonstrate that Accucopy is more accurate than Sclust, ABSOLUTE, and Sequenza.

Download Full-text

Decomposing the subclonal structure of tumors with two-way mixture models on copy number aberrations

10.1101/278887 ◽

2018 ◽

Author(s):

An-Shun Tai ◽

Chien-Hua Peng ◽

Shih-Chi Peng ◽

Wen-Ping Hsieh

Keyword(s):

Head And Neck Cancer ◽

Head And Neck ◽

Neck Cancer ◽

Copy Number ◽

Tumor Heterogeneity ◽

Tumor Evolution ◽

Depth Information ◽

Sequencing Data ◽

Single Nucleotide Variants ◽

Copy Number Aberrations

AbstractMultistage tumorigenesis is a dynamic process characterized by the accumulation of mutations. Thus, a tumor mass is composed of genetically divergent cell subclones. With the advancement of next-generation sequencing (NGS), mathematical models have been recently developed to decompose tumor subclonal architecture from a collective genome sequencing data. Most of the methods focused on single-nucleotide variants (SNVs). However, somatic copy number aberrations (CNAs) also play critical roles in carcinogenesis. Therefore, further modeling subclonal CNAs composition would hold the promise to improve the analysis of tumor heterogeneity and cancer evolution. To address this issue, we developed a two-way mixture Poisson model, named CloneDeMix for the deconvolution of read-depth information. It can infer the subclonal copy number, mutational cellular prevalence (MCP), subclone composition, and the order in which mutations occurred in the evolutionary hierarchy. The performance of CloneDeMix was systematically assessed in simulations. As a result, the accuracy of CNA inference was nearly 93% and the MCP was also accurately restored. Furthermore, we also demonstrated its applicability using head and neck cancer samples from TCGA. Our results inform about the extent of subclonal CNA diversity, and a group of candidate genes that probably initiate lymph node metastasis during tumor evolution was also discovered. Most importantly, these driver genes are located at 11q13.3 which is highly susceptible to copy number change in head and neck cancer genomes. This study successfully estimates subclonal CNAs and exhibit the evolutionary relationships of mutation events. By doing so, we can track tumor heterogeneity and identify crucial mutations during evolution process. Hence, it facilitates not only understanding the cancer development but finding potential therapeutic targets. Briefly, this framework has implications for improved modeling of tumor evolution and the importance of inclusion of subclonal CNAs.

Download Full-text

Integrative DNA copy number detection and genotyping from sequencing and array-based platforms

10.1101/172700 ◽

2017 ◽

Cited By ~ 2

Author(s):

Zilu Zhou ◽

Weixin Wang ◽

Li-San Wang ◽

Nancy Ruonan Zhang

Keyword(s):

Copy Number ◽

Association Studies ◽

Snp Array ◽

Supplementary Information ◽

Detection Accuracy ◽

Sequencing Data ◽

Array Data ◽

Combining Data ◽

Allele Specific ◽

Cnv Detection

AbstractMotivationCopy number variations (CNVs) are gains and losses of DNA segments and have been associated with disease. Many large-scale genetic association studies are performing CNV analysis using whole exome sequencing (WES) and whole genome sequencing (WGS). In many of these studies, previous SNP-array data are available. An integrated cross-platform analysis is expected to improve resolution and accuracy, yet there is no tool for effectively combining data from sequencing and array platforms. The detection of CNVs using sequencing data alone can also be further improved by the utilization of allele-specific reads.ResultsWe propose a statistical framework, integrated Copy Number Variation detection algorithm (iCNV), which can be applied to multiple study designs: WES only, WGS only, SNP array only, or any combination of SNP and sequencing data. iCNV applies platform specific normalization, utilizes allele specific reads from sequencing and integrates matched NGS and SNP-array data by a Hidden Markov Model (HMM). We compare integrated two-platform CNV detection using iCNV to naive intersection or union of platforms and show that iCNV increases sensitivity and robustness. We also assess the accuracy of iCNV on WGS data only, and show that the utilization of allele-specific reads improve CNV detection accuracy compared to existing methods.Availabilityhttps://github.com/zhouzilu/[email protected], [email protected] informationSupplementary data are available at Bioinformatics online.

Download Full-text

Assessing the performance of methods for copy number aberration detection from single-cell DNA sequencing data

PLoS Computational Biology ◽

10.1371/journal.pcbi.1008012 ◽

2020 ◽

Vol 16 (7) ◽

pp. e1008012 ◽

Cited By ~ 2

Author(s):

Xian F. Mallory ◽

Mohammadamin Edrisi ◽

Nicholas Navin ◽

Luay Nakhleh

Keyword(s):

Dna Sequencing ◽

Single Cell ◽

Copy Number ◽

Copy Number Aberration ◽

Sequencing Data ◽

Aberration Detection

Download Full-text

Copy-number-aware differential analysis of quantitative DNA sequencing data

Genome Research ◽

10.1101/gr.139055.112 ◽

2012 ◽

Vol 22 (12) ◽

pp. 2489-2496 ◽

Cited By ~ 22

Author(s):

M. D. Robinson ◽

D. Strbenac ◽

C. Stirzaker ◽

A. L. Statham ◽

J. Song ◽

...

Keyword(s):

Dna Sequencing ◽

Copy Number ◽

Differential Analysis ◽

Sequencing Data

Download Full-text

Allele-specific copy number profiling by next-generation DNA sequencing

Nucleic Acids Research ◽

10.1093/nar/gku1252 ◽

2014 ◽

Vol 43 (4) ◽

pp. e23-e23 ◽

Cited By ~ 24

Author(s):

Hao Chen ◽

John M. Bell ◽

Nicolas A. Zavala ◽

Hanlee P. Ji ◽

Nancy R. Zhang

Keyword(s):

Dna Sequencing ◽

Copy Number ◽

Next Generation ◽

Next Generation Dna Sequencing ◽

Allele Specific

Download Full-text

Power and pitfalls of computational methods for inferring clone phylogenies and mutation orders from bulk sequencing data

10.1101/697318 ◽

2019 ◽

Cited By ~ 3

Author(s):

Sayaka Miura ◽

Tracy Vu ◽

Jiamin Deng ◽

Tiffany Buturla ◽

Jiyeong Choi ◽

...

Keyword(s):

Computational Methods ◽

Metastatic Tumor ◽

Tumor Evolution ◽

Deconvolution Method ◽

Sequencing Data ◽

Limited Ability ◽

Multiple Tumor ◽

Different Tissues ◽

Over Time ◽

Selection Of

AbstractBackgroundTumors harbor extensive genetic heterogeneity in the form of distinct clone genotypes that arise over time and across different tissues and regions of a cancer patient. Many computational methods produce clone phylogenies from population bulk sequencing data collected from multiple tumor samples. These clone phylogenies are used to infer mutation order and clone origin times during tumor progression, rendering the selection of the appropriate clonal deconvolution method quite critical. Surprisingly, absolute and relative accuracies of these methods in correctly inferring clone phylogenies have not been consistently assessed.MethodsWe evaluated the performance of seven computational methods in producing clone phylogenies for simulated datasets in which clones were sampled from multiple sectors of a primary tumor (multi-region) or primary and metastatic tumors in a patient (multi-site). We assessed the accuracy of tested methods metrics in determining the order of mutations and the branching pattern within the reconstructed clone phylogenies.ResultsThe accuracy of the reconstructed mutation order varied extensively among methods (9% – 44% error). Methods also varied significantly in reconstructing the topologies of clone phylogenies, as 24% – 58% of the inferred clone groupings were incorrect. All the tested methods showed limited ability to identify ancestral clone sequences present in tumor samples correctly. The occurrence of multiple seeding events among tumor sites during metastatic tumor evolution hindered deconvolution of clones for all tested methods.ConclusionsOverall, CloneFinder, MACHINA, and LICHeE showed the highest overall accuracy, but none of the methods performed well for all simulated datasets and conditions.

Download Full-text