scholarly journals tHapMix: simulating tumour samples through haplotype mixtures

2016 ◽  
Author(s):  
Sergii Ivakhno ◽  
Camilla Colombo ◽  
Stephen Tanner ◽  
Philip Tedder ◽  
Stefano Berri ◽  
...  

AbstractMotivationLarge-scale rearrangements and copy number changes combined with different modes of cloevolution create extensive somatic genome diversity, making it difficult to develop versatile and scalable oriant calling tools and create well-calibrated benchmarks.ResultsWe developed a new simulation framework tHapMix that enables the creation of tumour samples with different ploidy, purity and polyclonality features. It easily scales to simulation of hundreds of somatic genomes, while re-use of real read data preserves noise and biases present in sequencing platforms. We further demonstrate tHapMix utility by creating a simulated set of 140 somatic genomes and showing how it can be used in training and testing of somatic copy number variant calling tools.Availability and implementationtHapMix is distributed under an open source license and can be downloaded from https://github.com/Illumina/[email protected] informationSupplementary data are available at Bioinformatics online.

Author(s):  
Liam F Spurr ◽  
Mehdi Touat ◽  
Alison M Taylor ◽  
Adrian M Dubuc ◽  
Juliann Shih ◽  
...  

Abstract Summary The expansion of targeted panel sequencing efforts has created opportunities for large-scale genomic analysis, but tools for copy-number quantification on panel data are lacking. We introduce ASCETS, a method for the efficient quantitation of arm and chromosome-level copy-number changes from targeted sequencing data. Availability and implementation ASCETS is implemented in R and is freely available to non-commercial users on GitHub: https://github.com/beroukhim-lab/ascets, along with detailed documentation. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Vol 20 (1) ◽  
Author(s):  
Michael D. Linderman ◽  
Davin Chia ◽  
Forrest Wallace ◽  
Frank A. Nothaft

Abstract Background XHMM is a widely used tool for copy-number variant (CNV) discovery from whole exome sequencing data but can require hours to days to run for large cohorts. A more scalable implementation would reduce the need for specialized computational resources and enable increased exploration of the configuration parameter space to obtain the best possible results. Results DECA is a horizontally scalable implementation of the XHMM algorithm using the ADAM framework and Apache Spark that incorporates novel algorithmic optimizations to eliminate unneeded computation. DECA parallelizes XHMM on both multi-core shared memory computers and large shared-nothing Spark clusters. We performed CNV discovery from the read-depth matrix in 2535 exomes in 9.3 min on a 16-core workstation (35.3× speedup vs. XHMM), 12.7 min using 10 executor cores on a Spark cluster (18.8× speedup vs. XHMM), and 9.8 min using 32 executor cores on Amazon AWS’ Elastic MapReduce. We performed CNV discovery from the original BAM files in 292 min using 640 executor cores on a Spark cluster. Conclusions We describe DECA’s performance, our algorithmic and implementation enhancements to XHMM to obtain that performance, and our lessons learned porting a complex genome analysis application to ADAM and Spark. ADAM and Apache Spark are a performant and productive platform for implementing large-scale genome analyses, but efficiently utilizing large clusters can require algorithmic optimizations and careful attention to Spark’s configuration parameters.


2016 ◽  
Author(s):  
Eric Roller ◽  
Sergii Ivakhno ◽  
Steve Lee ◽  
Thomas Royce ◽  
Stephen Tanner

Motivation: Increased throughput and diverse experimental designs of large-scale sequencing studies necessi-tate versatile, scalable and robust variant calling tools. In particular, identification of copy number changes re-mains a challenging task due to their complexity, susceptibility to sequencing biases, variation in coverage data and dependence on genome-wide sample properties, such as tumor polyploidy or polyclonality in cancer samples. Results: We have developed a new tool, Canvas, for identification of copy number changes from diverse se-quencing experiments including whole-genome matched tumor-normal and single-sample normal re-sequencing, as well as whole-exome matched and unmatched tumor-normal studies. In addition to variant calling, Canvas infers genome-wide parameters such as cancer ploidy, purity and heterogeneity. It provides fast and simple to execute workflows that can scale to thousands of samples and can be easily incorporated into existing variant calling pipelines. Availability: Canvas is distributed under an open source license and can be downloaded from https://github.com/Illumina/canvas.


2017 ◽  
Vol 108 (3) ◽  
pp. e282
Author(s):  
K.A. Beauchamp ◽  
P. Grauman ◽  
G.J. Hogan ◽  
K.R. Haas ◽  
G.M. Gould ◽  
...  

Pathology ◽  
2020 ◽  
Vol 52 ◽  
pp. S108
Author(s):  
Dylan A. Mordaunt ◽  
Julien Soubrier ◽  
Song Gao ◽  
Lesley Rawlings ◽  
Jillian Nicholl ◽  
...  

2017 ◽  
Author(s):  
Sergii Ivakhno ◽  
Eric Roller ◽  
Camilla Colombo ◽  
Philip Tedder ◽  
Anthony J. Cox

AbstractMotivationWhole genome sequencing is becoming a diagnostics of choice for the identification of rare inherited and de novo copy number variants in families with various pediatric and late-onset genetic diseases. However, joint variant calling in pedigrees is hampered by the complexity of consensus breakpoint alignment across samples within an arbitrary pedigree structure.ResultsWe have developed a new tool, Canvas SPW, for the identification of inherited and de novo copy number variants from pedigree sequencing data. Canvas SPW supports a number of family structures and provides a wide range of scoring and filtering options to automate and streamline identification of de novo variants.AvailabilityCanvas SPW is available for download from https://github.com/Illumina/[email protected] informationSupplementary data are available at Bioinformatics online.


2019 ◽  
Vol 44 (2) ◽  
pp. 148-158
Author(s):  
Varun Kilaru ◽  
Anna K. Knight ◽  
Seyma Katrinli ◽  
Dawayland Cobb ◽  
Adriana Lori ◽  
...  

2018 ◽  
Vol 218 (1) ◽  
pp. S166
Author(s):  
Dale Muzzey ◽  
Kyle A. Beauchamp ◽  
Peter Grauman ◽  
Gregory J. Hogan ◽  
Kevin R. Haas ◽  
...  

2019 ◽  
Author(s):  
Anders Brahme ◽  
Maj Hultén ◽  
Carin Bengtsson ◽  
Andreas Hultgren ◽  
Anders Zetterberg

AbstractEight different data sets, covering the whole human genome are compared with regard to their genomic distribution. A close correlation between cytological detected chiasma and MLH1 immunofluorescence sites with the recombination density distribution from the HapMap project was found. Sites with a high probability of chromatid breakage after exposure to low and high ionization density radiations are often located inside common and rare Fragile Sites (FSs) indicating that the common Radiation-Induced Breakpoint sites (RIBs) may be a new kind of more local fragility. Furthermore, Oncogenes and other cancer-related genes are commonly located in regions with an increased probability of rearrangements during genomic recombination, or in regions with high probability of copy number changes, possibly since these processes may be involved in oncogene activation and cancer induction. An increased CpG density is linked to regions of high gene density to secure high fidelity reproduction and survival. To minimize cancer induction these genes are often located in regions of decreased recombination density and/or higher than average CpG density. Interestingly, copy number changes occur predominantly at common RIBs and/or FSs at least for breast cancers with poor prognosis and they decrease weakly but significantly in regions with increasing recombination density and CpG density. It is compelling that all these datasets are influenced by the cells handling of double strand breaks and more generally DNA damage on its genome. In fact, the DNA repair genes are systematically avoiding regions with a high recombination density. This may be a consequence of natural selection, as they need to be intact to accurately handle repairable DNA lesions.


2017 ◽  
Author(s):  
Eugene Urrutia ◽  
Hao Chen ◽  
Zilu Zhou ◽  
Nancy R Zhang ◽  
Yuchao Jiang

AbstractSummaryCopy number variation is an important and abundant source of variation in the human genome, which has been associated with a number of diseases, especially cancer. Massively parallel next-generation sequencing allows copy number profiling with fine resolution. Such efforts, however, have met with mixed successes, with setbacks arising partly from the lack of reliable analytical methods to meet the diverse and unique challenges arising from the myriad experimental designs and study goals in genetic studies. In cancer genomics, detection of somatic copy number changes and profiling of allele-specific copy number (ASCN) are complicated by experimental biases and artifacts as well as normal cell contamination and cancer subclone admixture. Furthermore, careful statistical modeling is warranted to reconstruct tumor phylogeny by both somatic ASCN changes and single nucleotide variants. Here we describe a flexible computational pipeline, MARATHON, which integrates multiple related statistical software for copy number profiling and downstream analyses in disease genetic studies.Availability and implementationMARATHON is publicly available at https://github.com/yuchaojiang/[email protected] informationSupplementary data are available at Bioinformatics online.


Sign in / Sign up

Export Citation Format

Share Document