Evaluation of the performance of copy number variant prediction tools for the detection of deletions from whole genome sequencing data

Mapping Intimacies ◽

10.1101/482554 ◽

2018 ◽

Author(s):

Whitney Whitford ◽

Klaus Lehnert ◽

Russell G. Snell ◽

Jessie C. Jacobsen

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Copy Number ◽

Whole Genome ◽

Read Pair ◽

False Discovery ◽

Software Packages ◽

Variant Detection ◽

Detection Software ◽

Cnv Detection

AbstractBackgroundWhole genome sequencing (WGS) has increased in popularity and decreased in cost over the past decade, rendering this approach as a viable and sensitive method for variant detection. In addition to its utility for single nucleotide variant detection, WGS data has the potential to detect Copy Number Variants (CNV) to fine resolution. Many CNV detection software packages have been developed exploiting four main types of data: read pair, split read, read depth, and assembly based methods. The aim of this study was to evaluate the efficiency of each of these main approaches in detecting deletions.MethodsWGS data and high confidence deletion calls for the individual NA12878 from the Genome in a Bottle consortium were the benchmark dataset. The performance of Breakdancer, CNVnator, Delly, FermiKit, and Pindel was assessed by comparing the accuracy and sensitivity of each software package in detecting deletions exceeding 1kb.ResultsThere was considerable variability in the outputs of the different WGS CNV detection programs. The best performance was seen from Breakdancer and Delly, with 92.6% and 96.7% sensitivity, respectively and 34.5% and 68.5% false discovery rate (FDR), respectively. In comparison, Pindel, CNVnator, and FermiKit were less effective with sensitivities of 69.1%, 66.0%, and 15.8%, respectively and FDR of 91.3%, 69.0%, and 31.7%, respectively. Concordance across software packages was poor, with only 27 of the total 612 benchmark deletions identified by all five methodologies.ConclusionsThe WGS based CNV detection tools evaluated show disparate performance in identifying deletions ≥1kb, particularly those utilising different input data characteristics. Software that exploits read pair based data had the highest sensitivity, namely Breakdancer and Delly. Breakdancer also had the second lowest false discovery rate. Therefore, in this analysis read pair methods (Breakdancer in particular) were the best performing approaches for the identification of deletions ≥1kb, balancing accuracy and sensitivity. There is potential for improvement in the detection algorithms, particularly for reducing FDR. This analysis has validated the utility of WGS based CNV detection software to reliably identify deletions, and these findings will be of use when choosing appropriate software for deletion detection, in both research and diagnostic medicine.

Download Full-text

Detection and characterization of copy number variants based on whole-genome sequencing by DNBSEQ platforms

10.1101/786962 ◽

2019 ◽

Author(s):

Junhua Rao ◽

Lihua Peng ◽

Fang Chen ◽

Hui Jiang ◽

Chunyu Geng ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Copy Number ◽

Copy Number Variants ◽

Copy Number Variant ◽

Whole Genome ◽

Genome Wide ◽

Wide Range ◽

Distribution Sensitivity ◽

Cnv Detection

AbstractBackgroundNext-generation sequence (NGS) has rapidly developed in past years which makes whole-genome sequencing (WGS) becoming a more cost- and time-efficient choice in wide range of biological researches. We usually focus on some variant detection via WGS data, such as detection of single nucleotide polymorphism (SNP), insertion and deletion (Indel) and copy number variant (CNV), which playing an important role in many human diseases. However, the feasibility of CNV detection based on WGS by DNBSEQ™ platforms was unclear. We systematically analysed the genome-wide CNV detection power of DNBSEQ™ platforms and Illumina platforms on NA12878 with five commonly used tools, respectively.ResultsDNBSEQ™ platforms showed stable ability to detect slighter more CNVs on genome-wide (average 1.24-fold than Illumina platforms). Then, CNVs based on DNBSEQ™ platforms and Illumina platforms were evaluated with two public benchmarks of NA12878, respectively. DNBSEQ™ and Illumina platforms showed similar sensitivities and precisions on both two benchmarks. Further, the difference between tools for CNV detection was analyzed, and indicated the selection of tool for CNV detection could affected the CNV performance, such as count, distribution, sensitivity and precision.ConclusionThe major contribution of this paper is providing a comprehensive guide for CNV detection based on WGS by DNBSEQ™ platforms for the first time.

Download Full-text

JAX-CNV: A whole genome sequencing-based algorithm for copy number detection at clinical grade level

10.1101/2021.03.16.21252173 ◽

2021 ◽

Author(s):

Wan-Ping Lee ◽

Qihui Zhu ◽

Xiaofei Yang ◽

Silvia Liu ◽

Eliza Cerveira ◽

...

Keyword(s):

False Discovery Rate ◽

Whole Genome Sequencing ◽

Genome Sequencing ◽

Copy Number ◽

Copy Number Variant ◽

Fold Increase ◽

Chromosomal Microarray ◽

Whole Genome ◽

False Discovery ◽

Calling Algorithm

We aimed to develop a whole genome sequencing (WGS)-based copy number variant (CNV) calling algorithm with the potential of replacing chromosomal microarray assay (CMA) for clinical diagnosis. JAX-CNV is thus developed for CNV detection from WGS. The performance of this CNV calling algorithm was evaluated in a blinded manner on 31 samples and compared to the results of clinically-validated CMAs. Comparing to 112 CNVs reported by clinically-validated CMAs of the 31 samples, JAX-CNV is 100% recalling them. Besides, JAX-CNV identified an average of 30 CNVs per individual that is an approximately seven-fold increase compared to calls of clinically-validated CMAs. Experimental validation of 24 randomly selected CNVs, showed one false positive (i.e., a false discovery rate of 4.17%). A robustness test on lower-coverage data revealed a 100% sensitivity for CNVs greater than 300 kb (the current threshold for College of American Pathologists) down to 10x coverage. For CNVs greater than 50 kb, sensitivities were 100% for coverages deeper than 20x, 97% for 15x, and 95% for 10x. We developed a WGS-based CNV pipeline, including this newly developed CNV caller JAX-CNV, and found it capable of detecting CMA reported CNVs at 100% sensitivity with about 4% false discovery rate. We propose that JAX-CNV could be further examined in a multi-institutional study to justify the transition of first-tier genetic testing from CMAs to WGS. JAX-CNV is available on https://github.com/TheJacksonLaboratory/JAX-CNV.

Download Full-text

Performance of copy number variants detection based on whole-genome sequencing by DNBSEQ platforms

BMC Bioinformatics ◽

10.1186/s12859-020-03859-x ◽

2020 ◽

Vol 21 (1) ◽

Author(s):

Junhua Rao ◽

Lihua Peng ◽

Xinming Liang ◽

Hui Jiang ◽

Chunyu Geng ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Technology Use ◽

Copy Number ◽

Massively Parallel Sequencing ◽

Copy Number Variants ◽

Whole Genome ◽

Sequencing Data ◽

Single Nucleotide Variants ◽

Cnv Detection

Abstract Background DNBSEQ™ platforms are new massively parallel sequencing (MPS) platforms that use DNA nanoball technology. Use of data generated from DNBSEQ™ platforms to detect single nucleotide variants (SNVs) and small insertions and deletions (indels) has proven to be quite effective, while the feasibility of copy number variants (CNVs) detection is unclear. Results Here, we first benchmarked different CNV detection tools based on Illumina whole-genome sequencing (WGS) data of NA12878 and then assessed these tools in CNV detection based on DNBSEQ™ sequencing data from the same sample. When the same tool was used, the CNVs detected based on DNBSEQ™ and Illumina data were similar in quantity, length and distribution, while great differences existed within results from different tools and even based on data from a single platform. We further estimated the CNV detection power based on available CNV benchmarks of NA12878 and found similar precision and sensitivity between the DNBSEQ™ and Illumina platforms. We also found higher precision of CNVs shorter than 1 kbp based on DNBSEQ™ platforms than those based on Illumina platforms by using Pindel, DELLY and LUMPY. We carefully compared these two available benchmarks and found a large proportion of specific CNVs between them. Thus, we constructed a more complete CNV benchmark of NA12878 containing 3512 CNV regions. Conclusions We assessed and benchmarked CNV detections based on WGS with DNBSEQ™ platforms and provide guidelines for future studies.

Download Full-text

Impact of DNA source on genetic variant detection from human whole-genome sequencing data

Journal of Medical Genetics ◽

10.1136/jmedgenet-2019-106281 ◽

2019 ◽

Vol 56 (12) ◽

pp. 809-817 ◽

Cited By ~ 7

Author(s):

Brett Trost ◽

Susan Walker ◽

Syed A Haider ◽

Wilson W L Sung ◽

Sergio Pereira ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Genetic Variant ◽

Read Depth ◽

Detection Accuracy ◽

Whole Genome ◽

Sequencing Data ◽

Eukaryotic Dna ◽

Variant Detection ◽

Cnv Detection

BackgroundWhole blood is currently the most common DNA source for whole-genome sequencing (WGS), but for studies requiring non-invasive collection, self-collection, greater sample stability or additional tissue references, saliva or buccal samples may be preferred. However, the relative quality of sequencing data and accuracy of genetic variant detection from blood-derived, saliva-derived and buccal-derived DNA need to be thoroughly investigated.MethodsMatched blood, saliva and buccal samples from four unrelated individuals were used to compare sequencing metrics and variant-detection accuracy among these DNA sources.ResultsWe observed significant differences among DNA sources for sequencing quality metrics such as percentage of reads aligned and mean read depth (p<0.05). Differences were negligible in the accuracy of detecting short insertions and deletions; however, the false positive rate for single nucleotide variation detection was slightly higher in some saliva and buccal samples. The sensitivity of copy number variant (CNV) detection was up to 25% higher in blood samples, depending on CNV size and type, and appeared to be worse in saliva and buccal samples with high bacterial concentration. We also show that methylation-based enrichment for eukaryotic DNA in saliva and buccal samples increased alignment rates but also reduced read-depth uniformity, hampering CNV detection.ConclusionFor WGS, we recommend using DNA extracted from blood rather than saliva or buccal swabs; if saliva or buccal samples are used, we recommend against using methylation-based eukaryotic DNA enrichment. All data used in this study are available for further open-science investigation.

Download Full-text

Evaluation of tools for identifying large copy number variations from ultra-low-coverage whole-genome sequencing data

BMC Genomics ◽

10.1186/s12864-021-07686-z ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Johannes Smolander ◽

Sofia Khan ◽

Kalaimathy Singaravelu ◽

Leni Kauko ◽

Riikka J. Lund ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Sex Chromosomes ◽

Copy Number ◽

Copy Number Variations ◽

Whole Genome Sequencing Data ◽

Whole Genome ◽

Sequencing Data ◽

Low Coverage ◽

Cnv Detection

Abstract Background Detection of copy number variations (CNVs) from high-throughput next-generation whole-genome sequencing (WGS) data has become a widely used research method during the recent years. However, only a little is known about the applicability of the developed algorithms to ultra-low-coverage (0.0005–0.8×) data that is used in various research and clinical applications, such as digital karyotyping and single-cell CNV detection. Result Here, the performance of six popular read-depth based CNV detection algorithms (BIC-seq2, Canvas, CNVnator, FREEC, HMMcopy, and QDNAseq) was studied using ultra-low-coverage WGS data. Real-world array- and karyotyping kit-based validation were used as a benchmark in the evaluation. Additionally, ultra-low-coverage WGS data was simulated to investigate the ability of the algorithms to identify CNVs in the sex chromosomes and the theoretical minimum coverage at which these tools can accurately function. Our results suggest that while all the methods were able to detect large CNVs, many methods were susceptible to producing false positives when smaller CNVs (< 2 Mbp) were detected. There was also significant variability in their ability to identify CNVs in the sex chromosomes. Overall, BIC-seq2 was found to be the best method in terms of statistical performance. However, its significant drawback was by far the slowest runtime among the methods (> 3 h) compared with FREEC (~ 3 min), which we considered the second-best method. Conclusions Our comparative analysis demonstrates that CNV detection from ultra-low-coverage WGS data can be a highly accurate method for the detection of large copy number variations when their length is in millions of base pairs. These findings facilitate applications that utilize ultra-low-coverage CNV detection.

Download Full-text

Copy Number Variant Detection with Low-Coverage Whole-Genome Sequencing Represents a Viable Alternative to the Conventional Array-CGH

Diagnostics ◽

10.3390/diagnostics11040708 ◽

2021 ◽

Vol 11 (4) ◽

pp. 708

Author(s):

Marcel Kucharík ◽

Jaroslav Budiš ◽

Michaela Hýblová ◽

Gabriel Minárik ◽

Tomáš Szemes

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

In Silico ◽

Copy Number ◽

Normal Population ◽

Copy Number Variations ◽

Whole Genome ◽

Real Patient ◽

Low Coverage ◽

Cnv Detection

Copy number variations (CNVs) represent a type of structural variant involving alterations in the number of copies of specific regions of DNA that can either be deleted or duplicated. CNVs contribute substantially to normal population variability, however, abnormal CNVs cause numerous genetic disorders. At present, several methods for CNV detection are applied, ranging from the conventional cytogenetic analysis, through microarray-based methods (aCGH), to next-generation sequencing (NGS). In this paper, we present GenomeScreen, an NGS-based CNV detection method for low-coverage, whole-genome sequencing. We determined the theoretical limits of its accuracy and obtained confirmation in an extensive in silico study and in real patient samples with known genotypes. In theory, at least 6 M uniquely mapped reads are required to detect a CNV with the length of 100 kilobases (kb) or more with high confidence (Z-score > 7). In practice, the in silico analysis required at least 8 M to obtain >99% accuracy (for 100 kb deviations). We compared GenomeScreen with one of the currently used aCGH methods in diagnostic laboratories, which has mean resolution of 200 kb. GenomeScreen and aCGH both detected 59 deviations, while GenomeScreen furthermore detected 134 other (usually) smaller variations. When compared to aCGH, overall performance of the proposed GenemoScreen tool is comparable or superior in terms of accuracy, turn-around time, and cost-effectiveness, thus providing reasonable benefits, particularly in a prenatal diagnosis setting.

Download Full-text

0306 Exploring the feasibility of using copy number variants as genetic markers through large-scale whole genome sequencing experiments

Journal of Animal Science ◽

10.2527/jam2016-0306 ◽

2016 ◽

Vol 94 (suppl_5) ◽

pp. 146-146

Author(s):

D. M. Bickhart ◽

L. Xu ◽

J. L. Hutchison ◽

J. B. Cole ◽

D. J. Null ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Genetic Markers ◽

Genome Sequencing ◽

Copy Number ◽

Large Scale ◽

Copy Number Variants ◽

Whole Genome

Download Full-text

Whole-genome sequencing from the New Zealand Saccharomyces cerevisiae population reveals the genomic impacts of novel microbial range expansion

G3 Genes|Genome|Genetics ◽

10.1093/g3journal/jkaa027 ◽

2020 ◽

Vol 11 (1) ◽

Author(s):

Peter Higgins ◽

Cooper A Grace ◽

Soon A Lee ◽

Matthew R Goddard

Keyword(s):

Saccharomyces Cerevisiae ◽

New Zealand ◽

Whole Genome Sequencing ◽

Genome Sequencing ◽

Range Expansion ◽

Copy Number ◽

Small Scale ◽

Whole Genome ◽

Copy Number Changes ◽

The Impact

Abstract Saccharomyces cerevisiae is extensively utilized for commercial fermentation, and is also an important biological model; however, its ecology has only recently begun to be understood. Through the use of whole-genome sequencing, the species has been characterized into a number of distinct subpopulations, defined by geographical ranges and industrial uses. Here, the whole-genome sequences of 104 New Zealand (NZ) S. cerevisiae strains, including 52 novel genomes, are analyzed alongside 450 published sequences derived from various global locations. The impact of S. cerevisiae novel range expansion into NZ was investigated and these analyses reveal the positioning of NZ strains as a subgroup to the predominantly European/wine clade. A number of genomic differences with the European group correlate with range expansion into NZ, including 18 highly enriched single-nucleotide polymorphism (SNPs) and novel Ty1/2 insertions. While it is not possible to categorically determine if any genetic differences are due to stochastic process or the operations of natural selection, we suggest that the observation of NZ-specific copy number increases of four sugar transporter genes in the HXT family may reasonably represent an adaptation in the NZ S. cerevisiae subpopulation, and this correlates with the observations of copy number changes during adaptation in small-scale experimental evolution studies.

Download Full-text

Rare instances of non-random dropout with the monochrome multiplex qPCR assay for mitochondrial DNA copy number

10.1101/2021.10.11.463983 ◽

2021 ◽

Author(s):

Stephanie Y Yang ◽

Charles E Newcomb ◽

Stephanie L Battle ◽

Anthony YY Hsieh ◽

Hailey L Chapman ◽

...

Keyword(s):

Mitochondrial Dna ◽

Whole Genome Sequencing ◽

Genome Sequencing ◽

Copy Number ◽

Whole Genome Sequencing Data ◽

Whole Genome ◽

Dna Copy Number ◽

Loop Primer ◽

Mitochondrial Dna Copy Number ◽

D Loop

Mitochondrial DNA copy number (mtDNA-CN) is a proxy for mitochondrial function and has been of increasing interest to the mitochondrial research community. There are several ways to measure mtDNA-CN, ranging from whole genome sequencing to qPCR. A recent article from the Journal of Molecular Diagnostics described a novel method for measuring mtDNA-CN that is both inexpensive and reproducible. However, we show that certain individuals, particularly those with very low qPCR mtDNA measurements, show poor concordance between qPCR and whole genome sequencing measurements. After examining whole genome sequencing data, this seems to be due to polymorphisms within the D-loop primer region. Non-concordant mtDNA-CN was observed in all instances of polymorphisms at certain positions in the D-loop primer regions, however, not all positions are susceptible to this effect. In particular, these polymorphisms appear disproportionately in individuals with the L, T, and U mitochondrial haplogroups, indicating non-random dropout.

Download Full-text

MBRS-59. SINGLE-CELL WHOLE-GENOME SEQUENCING DISSECTS INTRA-TUMOURAL GENOMIC HETEROGENEITY AND CLONAL EVOLUTION IN CHILDHOOD MEDULLOBLASTOMA

Neuro-Oncology ◽

10.1093/neuonc/noaa222.563 ◽

2020 ◽

Vol 22 (Supplement_3) ◽

pp. iii408-iii408

Author(s):

Marina Danilenko ◽

Masood Zaka ◽

Claire Keeling ◽

Stephen Crosier ◽

Rafiqul Hussain ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Single Cell ◽

Genome Sequencing ◽

Copy Number ◽

Single Cell Analysis ◽

Mutational Analysis ◽

Single Cells ◽

Clonal Evolution ◽

Whole Genome ◽

Clinically Significant

Abstract Medulloblastomas harbor clinically-significant intra-tumoral heterogeneity for key biomarkers (e.g. MYC/MYCN, β-catenin). Recent studies have characterized transcriptional heterogeneity at the single-cell level, however the underlying genomic copy number and mutational architecture remains to be resolved. We therefore sought to establish the intra-tumoural genomic heterogeneity of medulloblastoma at single-cell resolution. Copy number patterns were dissected by whole-genome sequencing in 1024 single cells isolated from multiple distinct tumour regions within 16 snap-frozen medulloblastomas, representing the major molecular subgroups (WNT, SHH, Group3, Group4) and genotypes (i.e. MYC amplification, TP53 mutation). Common copy number driver and subclonal events were identified, providing clear evidence of copy number evolution in medulloblastoma development. Moreover, subclonal whole-arm and focal copy number alterations covering important genomic loci (e.g. on chr10 of SHH patients) were detected in single tumour cells, yet undetectable at the bulk-tumor level. Spatial copy number heterogeneity was also common, with differences between clonal and subclonal events detected in distinct regions of individual tumours. Mutational analysis of the cells allowed dissection of spatial and clonal heterogeneity patterns for key medulloblastoma mutations (e.g. CTNNB1, TP53, SMARCA4, PTCH1) within our cohort. Integrated copy number and mutational analysis is underway to establish their inter-relationships and relative contributions to clonal evolution during tumourigenesis. In summary, single-cell analysis has enabled the resolution of common mutational and copy number drivers, alongside sub-clonal events and distinct patterns of clonal and spatial evolution, in medulloblastoma development. We anticipate these findings will provide a critical foundation for future improved biomarker selection, and the development of targeted therapies.

Download Full-text