tHapMix: simulating tumour samples through haplotype mixtures

Mapping Intimacies ◽

10.1101/057414 ◽

2016 ◽

Author(s):

Sergii Ivakhno ◽

Camilla Colombo ◽

Stephen Tanner ◽

Philip Tedder ◽

Stefano Berri ◽

...

Keyword(s):

Copy Number ◽

Large Scale ◽

Variant Calling ◽

Copy Number Variant ◽

Supplementary Information ◽

Genome Diversity ◽

Simulation Framework ◽

Somatic Genome ◽

Copy Number Changes ◽

Sequencing Platforms

AbstractMotivationLarge-scale rearrangements and copy number changes combined with different modes of cloevolution create extensive somatic genome diversity, making it difficult to develop versatile and scalable oriant calling tools and create well-calibrated benchmarks.ResultsWe developed a new simulation framework tHapMix that enables the creation of tumour samples with different ploidy, purity and polyclonality features. It easily scales to simulation of hundreds of somatic genomes, while re-use of real read data preserves noise and biases present in sequencing platforms. We further demonstrate tHapMix utility by creating a simulated set of 140 somatic genomes and showing how it can be used in training and testing of somatic copy number variant calling tools.Availability and implementationtHapMix is distributed under an open source license and can be downloaded from https://github.com/Illumina/[email protected] informationSupplementary data are available at Bioinformatics online.

Download Full-text

Quantification of aneuploidy in targeted sequencing data using ASCETS

Bioinformatics ◽

10.1093/bioinformatics/btaa980 ◽

2020 ◽

Author(s):

Liam F Spurr ◽

Mehdi Touat ◽

Alison M Taylor ◽

Adrian M Dubuc ◽

Juliann Shih ◽

...

Keyword(s):

Copy Number ◽

Large Scale ◽

Genomic Analysis ◽

Targeted Sequencing ◽

Supplementary Information ◽

Supplementary Data ◽

Sequencing Data ◽

Copy Number Changes ◽

Panel Sequencing ◽

Chromosome Level

Abstract Summary The expansion of targeted panel sequencing efforts has created opportunities for large-scale genomic analysis, but tools for copy-number quantification on panel data are lacking. We introduce ASCETS, a method for the efficient quantitation of arm and chromosome-level copy-number changes from targeted sequencing data. Availability and implementation ASCETS is implemented in R and is freely available to non-commercial users on GitHub: https://github.com/beroukhim-lab/ascets, along with detailed documentation. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

DECA: scalable XHMM exome copy-number variant calling with ADAM and Apache Spark

BMC Bioinformatics ◽

10.1186/s12859-019-3108-7 ◽

2019 ◽

Vol 20 (1) ◽

Cited By ~ 2

Author(s):

Michael D. Linderman ◽

Davin Chia ◽

Forrest Wallace ◽

Frank A. Nothaft

Keyword(s):

Copy Number ◽

Large Scale ◽

Variant Calling ◽

Copy Number Variant ◽

Read Depth ◽

Lessons Learned ◽

Apache Spark ◽

Sequencing Data ◽

Whole Exome Sequencing Data ◽

Genome Analyses

Abstract Background XHMM is a widely used tool for copy-number variant (CNV) discovery from whole exome sequencing data but can require hours to days to run for large cohorts. A more scalable implementation would reduce the need for specialized computational resources and enable increased exploration of the configuration parameter space to obtain the best possible results. Results DECA is a horizontally scalable implementation of the XHMM algorithm using the ADAM framework and Apache Spark that incorporates novel algorithmic optimizations to eliminate unneeded computation. DECA parallelizes XHMM on both multi-core shared memory computers and large shared-nothing Spark clusters. We performed CNV discovery from the read-depth matrix in 2535 exomes in 9.3 min on a 16-core workstation (35.3× speedup vs. XHMM), 12.7 min using 10 executor cores on a Spark cluster (18.8× speedup vs. XHMM), and 9.8 min using 32 executor cores on Amazon AWS’ Elastic MapReduce. We performed CNV discovery from the original BAM files in 292 min using 640 executor cores on a Spark cluster. Conclusions We describe DECA’s performance, our algorithmic and implementation enhancements to XHMM to obtain that performance, and our lessons learned porting a complex genome analysis application to ADAM and Spark. ADAM and Apache Spark are a performant and productive platform for implementing large-scale genome analyses, but efficiently utilizing large clusters can require algorithmic optimizations and careful attention to Spark’s configuration parameters.

Download Full-text

Canvas: versatile and scalable detection of copy number variants

10.1101/036194 ◽

2016 ◽

Author(s):

Eric Roller ◽

Sergii Ivakhno ◽

Steve Lee ◽

Thomas Royce ◽

Stephen Tanner

Keyword(s):

Copy Number ◽

Large Scale ◽

Copy Number Variants ◽

Variant Calling ◽

Experimental Designs ◽

Genome Wide ◽

Whole Exome ◽

Sequencing Studies ◽

Copy Number Changes ◽

Robust Variant

Motivation: Increased throughput and diverse experimental designs of large-scale sequencing studies necessi-tate versatile, scalable and robust variant calling tools. In particular, identification of copy number changes re-mains a challenging task due to their complexity, susceptibility to sequencing biases, variation in coverage data and dependence on genome-wide sample properties, such as tumor polyploidy or polyclonality in cancer samples. Results: We have developed a new tool, Canvas, for identification of copy number changes from diverse se-quencing experiments including whole-genome matched tumor-normal and single-sample normal re-sequencing, as well as whole-exome matched and unmatched tumor-normal studies. In addition to variant calling, Canvas infers genome-wide parameters such as cancer ploidy, purity and heterogeneity. It provides fast and simple to execute workflows that can scale to thousands of samples and can be easily incorporated into existing variant calling pipelines. Availability: Canvas is distributed under an open source license and can be downloaded from https://github.com/Illumina/canvas.

Download Full-text

Copy number variant calling on a 177 gene expanded carrier screening panel reveals impact of hbb deletions

Fertility and Sterility ◽

10.1016/j.fertnstert.2017.07.836 ◽

2017 ◽

Vol 108 (3) ◽

pp. e282

Author(s):

K.A. Beauchamp ◽

P. Grauman ◽

G.J. Hogan ◽

K.R. Haas ◽

G.M. Gould ◽

...

Keyword(s):

Copy Number ◽

Variant Calling ◽

Copy Number Variant ◽

Carrier Screening ◽

Expanded Carrier Screening

Download Full-text

Review of 12 months of copy number variant calling on a clinical next generation sequencing pipeline

Pathology ◽

10.1016/j.pathol.2020.01.370 ◽

2020 ◽

Vol 52 ◽

pp. S108

Author(s):

Dylan A. Mordaunt ◽

Julien Soubrier ◽

Song Gao ◽

Lesley Rawlings ◽

Jillian Nicholl ◽

...

Keyword(s):

Next Generation Sequencing ◽

Copy Number ◽

Variant Calling ◽

Copy Number Variant ◽

Next Generation ◽

Generation Sequencing

Download Full-text

Canvas SPW: calling de novo copy number variants in pedigrees

10.1101/121939 ◽

2017 ◽

Author(s):

Sergii Ivakhno ◽

Eric Roller ◽

Camilla Colombo ◽

Philip Tedder ◽

Anthony J. Cox

Keyword(s):

Copy Number ◽

De Novo ◽

Late Onset ◽

Genetic Diseases ◽

Copy Number Variants ◽

Variant Calling ◽

Supplementary Information ◽

Sequencing Data ◽

Pedigree Structure ◽

Wide Range

AbstractMotivationWhole genome sequencing is becoming a diagnostics of choice for the identification of rare inherited and de novo copy number variants in families with various pediatric and late-onset genetic diseases. However, joint variant calling in pedigrees is hampered by the complexity of consensus breakpoint alignment across samples within an arbitrary pedigree structure.ResultsWe have developed a new tool, Canvas SPW, for the identification of inherited and de novo copy number variants from pedigree sequencing data. Canvas SPW supports a number of family structures and provides a wide range of scoring and filtering options to automate and streamline identification of de novo variants.AvailabilityCanvas SPW is available for download from https://github.com/Illumina/[email protected] informationSupplementary data are available at Bioinformatics online.

Download Full-text

Critical evaluation of copy number variant calling methods using DNA methylation

Genetic Epidemiology ◽

10.1002/gepi.22269 ◽

2019 ◽

Vol 44 (2) ◽

pp. 148-158

Author(s):

Varun Kilaru ◽

Anna K. Knight ◽

Seyma Katrinli ◽

Dawayland Cobb ◽

Adriana Lori ◽

...

Keyword(s):

Dna Methylation ◽

Copy Number ◽

Critical Evaluation ◽

Variant Calling ◽

Copy Number Variant

Download Full-text

259: Copy number variant calling on a 176-disease expanded carrier screening panel

American Journal of Obstetrics and Gynecology ◽

10.1016/j.ajog.2017.10.187 ◽

2018 ◽

Vol 218 (1) ◽

pp. S166

Author(s):

Dale Muzzey ◽

Kyle A. Beauchamp ◽

Peter Grauman ◽

Gregory J. Hogan ◽

Kevin R. Haas ◽

...

Keyword(s):

Copy Number ◽

Variant Calling ◽

Copy Number Variant ◽

Carrier Screening ◽

Expanded Carrier Screening

Download Full-text

Large-scale correlations between eight key double strand break related data sets over the whole human genome

10.1101/581173 ◽

2019 ◽

Author(s):

Anders Brahme ◽

Maj Hultén ◽

Carin Bengtsson ◽

Andreas Hultgren ◽

Anders Zetterberg

Keyword(s):

Human Genome ◽

Copy Number ◽

High Probability ◽

Large Scale ◽

Fragile Sites ◽

Dna Lesions ◽

Data Sets ◽

Breast Cancers ◽

Cancer Induction ◽

Copy Number Changes

AbstractEight different data sets, covering the whole human genome are compared with regard to their genomic distribution. A close correlation between cytological detected chiasma and MLH1 immunofluorescence sites with the recombination density distribution from the HapMap project was found. Sites with a high probability of chromatid breakage after exposure to low and high ionization density radiations are often located inside common and rare Fragile Sites (FSs) indicating that the common Radiation-Induced Breakpoint sites (RIBs) may be a new kind of more local fragility. Furthermore, Oncogenes and other cancer-related genes are commonly located in regions with an increased probability of rearrangements during genomic recombination, or in regions with high probability of copy number changes, possibly since these processes may be involved in oncogene activation and cancer induction. An increased CpG density is linked to regions of high gene density to secure high fidelity reproduction and survival. To minimize cancer induction these genes are often located in regions of decreased recombination density and/or higher than average CpG density. Interestingly, copy number changes occur predominantly at common RIBs and/or FSs at least for breast cancers with poor prognosis and they decrease weakly but significantly in regions with increasing recombination density and CpG density. It is compelling that all these datasets are influenced by the cells handling of double strand breaks and more generally DNA damage on its genome. In fact, the DNA repair genes are systematically avoiding regions with a high recombination density. This may be a consequence of natural selection, as they need to be intact to accurately handle repairable DNA lesions.

Download Full-text

Integrative pipeline for profiling DNA copy number and inferring tumor phylogeny

10.1101/195230 ◽

2017 ◽

Cited By ~ 1

Author(s):

Eugene Urrutia ◽

Hao Chen ◽

Zilu Zhou ◽

Nancy R Zhang ◽

Yuchao Jiang

Keyword(s):

Copy Number ◽

Cancer Genomics ◽

Supplementary Information ◽

Single Nucleotide Variants ◽

Genetic Studies ◽

Number Variation ◽

Allele Specific ◽

Copy Number Changes ◽

Fine Resolution ◽

Tumor Phylogeny

AbstractSummaryCopy number variation is an important and abundant source of variation in the human genome, which has been associated with a number of diseases, especially cancer. Massively parallel next-generation sequencing allows copy number profiling with fine resolution. Such efforts, however, have met with mixed successes, with setbacks arising partly from the lack of reliable analytical methods to meet the diverse and unique challenges arising from the myriad experimental designs and study goals in genetic studies. In cancer genomics, detection of somatic copy number changes and profiling of allele-specific copy number (ASCN) are complicated by experimental biases and artifacts as well as normal cell contamination and cancer subclone admixture. Furthermore, careful statistical modeling is warranted to reconstruct tumor phylogeny by both somatic ASCN changes and single nucleotide variants. Here we describe a flexible computational pipeline, MARATHON, which integrates multiple related statistical software for copy number profiling and downstream analyses in disease genetic studies.Availability and implementationMARATHON is publicly available at https://github.com/yuchaojiang/[email protected] informationSupplementary data are available at Bioinformatics online.

Download Full-text