Copy Number Variant Detection with Low-Coverage Whole-Genome Sequencing Represents a Viable Alternative to the Conventional Array-CGH

Marcel Kucharík; Jaroslav Budiš; Michaela Hýblová; Gabriel Minárik; Tomáš Szemes

doi:10.3390/diagnostics11040708

Copy Number Variant Detection with Low-Coverage Whole-Genome Sequencing Represents a Viable Alternative to the Conventional Array-CGH

Diagnostics ◽

10.3390/diagnostics11040708 ◽

2021 ◽

Vol 11 (4) ◽

pp. 708

Author(s):

Marcel Kucharík ◽

Jaroslav Budiš ◽

Michaela Hýblová ◽

Gabriel Minárik ◽

Tomáš Szemes

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

In Silico ◽

Copy Number ◽

Normal Population ◽

Copy Number Variations ◽

Whole Genome ◽

Real Patient ◽

Low Coverage ◽

Cnv Detection

Copy number variations (CNVs) represent a type of structural variant involving alterations in the number of copies of specific regions of DNA that can either be deleted or duplicated. CNVs contribute substantially to normal population variability, however, abnormal CNVs cause numerous genetic disorders. At present, several methods for CNV detection are applied, ranging from the conventional cytogenetic analysis, through microarray-based methods (aCGH), to next-generation sequencing (NGS). In this paper, we present GenomeScreen, an NGS-based CNV detection method for low-coverage, whole-genome sequencing. We determined the theoretical limits of its accuracy and obtained confirmation in an extensive in silico study and in real patient samples with known genotypes. In theory, at least 6 M uniquely mapped reads are required to detect a CNV with the length of 100 kilobases (kb) or more with high confidence (Z-score > 7). In practice, the in silico analysis required at least 8 M to obtain >99% accuracy (for 100 kb deviations). We compared GenomeScreen with one of the currently used aCGH methods in diagnostic laboratories, which has mean resolution of 200 kb. GenomeScreen and aCGH both detected 59 deviations, while GenomeScreen furthermore detected 134 other (usually) smaller variations. When compared to aCGH, overall performance of the proposed GenemoScreen tool is comparable or superior in terms of accuracy, turn-around time, and cost-effectiveness, thus providing reasonable benefits, particularly in a prenatal diagnosis setting.

Download Full-text

Evaluation of tools for identifying large copy number variations from ultra-low-coverage whole-genome sequencing data

BMC Genomics ◽

10.1186/s12864-021-07686-z ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Johannes Smolander ◽

Sofia Khan ◽

Kalaimathy Singaravelu ◽

Leni Kauko ◽

Riikka J. Lund ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Sex Chromosomes ◽

Copy Number ◽

Copy Number Variations ◽

Whole Genome Sequencing Data ◽

Whole Genome ◽

Sequencing Data ◽

Low Coverage ◽

Cnv Detection

Abstract Background Detection of copy number variations (CNVs) from high-throughput next-generation whole-genome sequencing (WGS) data has become a widely used research method during the recent years. However, only a little is known about the applicability of the developed algorithms to ultra-low-coverage (0.0005–0.8×) data that is used in various research and clinical applications, such as digital karyotyping and single-cell CNV detection. Result Here, the performance of six popular read-depth based CNV detection algorithms (BIC-seq2, Canvas, CNVnator, FREEC, HMMcopy, and QDNAseq) was studied using ultra-low-coverage WGS data. Real-world array- and karyotyping kit-based validation were used as a benchmark in the evaluation. Additionally, ultra-low-coverage WGS data was simulated to investigate the ability of the algorithms to identify CNVs in the sex chromosomes and the theoretical minimum coverage at which these tools can accurately function. Our results suggest that while all the methods were able to detect large CNVs, many methods were susceptible to producing false positives when smaller CNVs (< 2 Mbp) were detected. There was also significant variability in their ability to identify CNVs in the sex chromosomes. Overall, BIC-seq2 was found to be the best method in terms of statistical performance. However, its significant drawback was by far the slowest runtime among the methods (> 3 h) compared with FREEC (~ 3 min), which we considered the second-best method. Conclusions Our comparative analysis demonstrates that CNV detection from ultra-low-coverage WGS data can be a highly accurate method for the detection of large copy number variations when their length is in millions of base pairs. These findings facilitate applications that utilize ultra-low-coverage CNV detection.

Download Full-text

Detection and characterization of copy number variants based on whole-genome sequencing by DNBSEQ platforms

10.1101/786962 ◽

2019 ◽

Author(s):

Junhua Rao ◽

Lihua Peng ◽

Fang Chen ◽

Hui Jiang ◽

Chunyu Geng ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Copy Number ◽

Copy Number Variants ◽

Copy Number Variant ◽

Whole Genome ◽

Genome Wide ◽

Wide Range ◽

Distribution Sensitivity ◽

Cnv Detection

AbstractBackgroundNext-generation sequence (NGS) has rapidly developed in past years which makes whole-genome sequencing (WGS) becoming a more cost- and time-efficient choice in wide range of biological researches. We usually focus on some variant detection via WGS data, such as detection of single nucleotide polymorphism (SNP), insertion and deletion (Indel) and copy number variant (CNV), which playing an important role in many human diseases. However, the feasibility of CNV detection based on WGS by DNBSEQ™ platforms was unclear. We systematically analysed the genome-wide CNV detection power of DNBSEQ™ platforms and Illumina platforms on NA12878 with five commonly used tools, respectively.ResultsDNBSEQ™ platforms showed stable ability to detect slighter more CNVs on genome-wide (average 1.24-fold than Illumina platforms). Then, CNVs based on DNBSEQ™ platforms and Illumina platforms were evaluated with two public benchmarks of NA12878, respectively. DNBSEQ™ and Illumina platforms showed similar sensitivities and precisions on both two benchmarks. Further, the difference between tools for CNV detection was analyzed, and indicated the selection of tool for CNV detection could affected the CNV performance, such as count, distribution, sensitivity and precision.ConclusionThe major contribution of this paper is providing a comprehensive guide for CNV detection based on WGS by DNBSEQ™ platforms for the first time.

Download Full-text

Copy number variant detection with low-coverage whole-genome sequencing is a viable replacement for the traditional array-CGH

10.1101/2020.09.07.20183665 ◽

2020 ◽

Author(s):

Marcel Kucharik ◽

Jaroslav Budis ◽

Michaela Hyblova ◽

Gabriel Minarik ◽

Tomas Szemes

Keyword(s):

In Silico ◽

Copy Number ◽

Normal Population ◽

Genetic Disorders ◽

Prenatal Testing ◽

In Silico Analysis ◽

Copy Number Variant ◽

Detection Algorithm ◽

Copy Number Variations ◽

Cnv Detection

Copy number variations (CNVs) are a type of structural variant involving alterations in the number of copies of specific regions of DNA, which can either be deleted or duplicated. CNVs contribute substantially to normal population variability; however, abnormal CNVs cause numerous genetic disorders. Nowadays, several methods for CNV detection are used, from the conventional cytogenetic analysis through microarray-based methods (aCGH) to next-generation sequencing (NGS). We present GenomeScreen - NGS based CNV detection method based on a previously described CNV detection algorithm used for non-invasive prenatal testing (NIPT). We determined theoretical limits of its accuracy and confirmed it with extensive in-silico study and already genotyped samples. Theoretically, at least 6M uniquely mapped reads are required to detect CNV with a length of 100 kilobases (kb) or more with high confidence (Z-score > 7). In practice, the in-silico analysis showed the requirement at least 8M to obtain >99% accuracy (for 100 kb deviations). We compared GenomeScreen with one of the currently used aCGH methods in diagnostic laboratories, which has a 200 kb mean resolution. GenomeScreen and aCGH both detected 59 deviations, GenomeScreen furthermore detected 134 other (usually) smaller variations. Furthermore, the overall cost per sample is about 2-3x lower in the case of GenomeScreen.

Download Full-text

Copy number analysis by low coverage whole genome sequencing using ultra low-input DNA from formalin-fixed paraffin embedded tumor tissue

Genome Medicine ◽

10.1186/s13073-016-0375-z ◽

2016 ◽

Vol 8 (1) ◽

Cited By ~ 18

Author(s):

Tanjina Kader ◽

David L. Goode ◽

Stephen Q. Wong ◽

Jacquie Connaughton ◽

Simone M. Rowley ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Copy Number ◽

Tumor Tissue ◽

Whole Genome ◽

Copy Number Analysis ◽

Formalin Fixed Paraffin ◽

Formalin Fixed Paraffin Embedded ◽

Low Coverage ◽

Formalin Fixed

Download Full-text

Using low-coverage whole genome sequencing technique to analyze the chromosomal copy number alterations in the exfoliative cells of cervical cancer

Journal of Gynecologic Oncology ◽

10.3802/jgo.2018.29.e78 ◽

2018 ◽

Vol 29 (5) ◽

Author(s):

Tong Ren ◽

Jing Suo ◽

Shikai Liu ◽

Shu Wang ◽

Shan Shu ◽

...

Keyword(s):

Cervical Cancer ◽

Whole Genome Sequencing ◽

Genome Sequencing ◽

Copy Number ◽

Whole Genome ◽

Copy Number Alterations ◽

Sequencing Technique ◽

Chromosomal Copy Number ◽

Chromosomal Copy ◽

Low Coverage

Download Full-text

Cell-free tumour DNA analysis detects copy number alterations in gastro-oesophageal cancer patients

PLoS ONE ◽

10.1371/journal.pone.0245488 ◽

2021 ◽

Vol 16 (2) ◽

pp. e0245488

Author(s):

Karin Wallander ◽

Jesper Eisfeldt ◽

Mats Lindblad ◽

Daniel Nilsson ◽

Kenny Billiau ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Oesophageal Cancer ◽

Copy Number ◽

Dna Analysis ◽

Tissue Sample ◽

Whole Genome ◽

Copy Number Alterations ◽

Plasma Dna ◽

Low Coverage

Background Analysis of cell-free tumour DNA, a liquid biopsy, is a promising biomarker for cancer. We have performed a proof-of principle study to test the applicability in the clinical setting, analysing copy number alterations (CNAs) in plasma and tumour tissue from 44 patients with gastro-oesophageal cancer. Methods DNA was isolated from blood plasma and a tissue sample from each patient. Array-CGH was applied to the tissue DNA. The cell-free plasma DNA was sequenced by low-coverage whole-genome sequencing using a clinical pipeline for non-invasive prenatal testing. WISECONDOR and ichorCNA, two bioinformatic tools, were used to process the output data and were compared to each other. Results Cancer-associated CNAs could be seen in 59% (26/44) of the tissue biopsies. In the plasma samples, a targeted approach analysing 61 regions of special interest in gastro-oesophageal cancer detected cancer-associated CNAs with a z-score >5 in 11 patients. Broadening the analysis to a whole-genome view, 17/44 patients (39%) had cancer-associated CNAs using WISECONDOR and 13 (30%) using ichorCNA. Of the 26 patients with tissue-verified cancer-associated CNAs, 14 (54%) had corresponding CNAs in plasma. Potentially clinically actionable amplifications overlapping the genes VEGFA, EGFR and FGFR2 were detected in the plasma from three patients. Conclusions We conclude that low-coverage whole-genome sequencing without prior knowledge of the tumour alterations could become a useful tool for cell-free tumour DNA analysis of total CNAs in plasma from patients with gastro-oesophageal cancer.

Download Full-text

Noninvasive prenatal testing for fetal subchromosomal copy number variations and chromosomal aneuploidy by low‐pass whole‐genome sequencing

Molecular Genetics & Genomic Medicine ◽

10.1002/mgg3.674 ◽

2019 ◽

Vol 7 (6) ◽

Cited By ~ 6

Author(s):

Dongyi Yu ◽

Kai Zhang ◽

Meiyan Han ◽

Wei Pan ◽

Ying Chen ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Copy Number ◽

Prenatal Testing ◽

Copy Number Variations ◽

Whole Genome ◽

Noninvasive Prenatal Testing ◽

Chromosomal Aneuploidy ◽

Low Pass

Download Full-text

Copy Number Variation in MUC5AC and Susceptibility to Allergic Rhinitis: A Low-Coverage Whole-Genome Sequencing and Validation Cohort Study

Genetic Testing and Molecular Biomarkers ◽

10.1089/gtmb.2019.0166 ◽

2020 ◽

Vol 24 (4) ◽

pp. 173-180

Author(s):

Yan Wang ◽

Linge Li ◽

Yuping Yang ◽

Juan Feng ◽

Lingling Wang ◽

...

Keyword(s):

Allergic Rhinitis ◽

Cohort Study ◽

Copy Number Variation ◽

Whole Genome Sequencing ◽

Genome Sequencing ◽

Copy Number ◽

Validation Cohort ◽

Whole Genome ◽

Number Variation ◽

Low Coverage

Download Full-text

A Comparison of Tools for Copy-Number Variation Detection in Germline Whole Exome and Whole Genome Sequencing Data

Cancers ◽

10.3390/cancers13246283 ◽

2021 ◽

Vol 13 (24) ◽

pp. 6283

Author(s):

Migle Gabrielaite ◽

Mathias Husted Torp ◽

Malthe Sebro Rasmussen ◽

Sergio Andreu-Sánchez ◽

Filipe Garrett Vieira ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Copy Number ◽

Reference Sample ◽

Snp Array ◽

Copy Number Variations ◽

Whole Genome Sequencing Data ◽

Whole Genome ◽

Standard Reference Sample ◽

Whole Exome

Copy-number variations (CNVs) have important clinical implications for several diseases and cancers. Relevant CNVs are hard to detect because common structural variations define large parts of the human genome. CNV calling from short-read sequencing would allow single protocol full genomic profiling. We reviewed 50 popular CNV calling tools and included 11 tools for benchmarking in a reference cohort encompassing 39 whole genome sequencing (WGS) samples paired current clinical standard—SNP-array based CNV calling. Additionally, for nine samples we also performed whole exome sequencing (WES), to address the effect of sequencing protocol on CNV calling. Furthermore, we included Gold Standard reference sample NA12878, and tested 12 samples with CNVs confirmed by multiplex ligation-dependent probe amplification (MLPA). Tool performance varied greatly in the number of called CNVs and bias for CNV lengths. Some tools had near-perfect recall of CNVs from arrays for some samples, but poor precision. Several tools had better performance for NA12878, which could be a result of overfitting. We suggest combining the best tools also based on different methodologies: GATK gCNV, Lumpy, DELLY, and cn.MOPS. Reducing the total number of called variants could potentially be assisted by the use of background panels for filtering of frequently called variants.

Download Full-text

Submegabase copy number variations arise during cerebral cortical neurogenesis as revealed by single-cell whole-genome sequencing

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1812702115 ◽

2018 ◽

Vol 115 (42) ◽

pp. 10804-10809 ◽

Cited By ~ 18

Author(s):

Suzanne Rohrback ◽

Craig April ◽

Fiona Kaper ◽

Richard R. Rivera ◽

Christine S. Liu ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Single Cell ◽

Genome Sequencing ◽

Copy Number ◽

Cortical Development ◽

Building Blocks ◽

Copy Number Variations ◽

Whole Genome ◽

Mature Brain ◽

The Brain

Somatic copy number variations (CNVs) exist in the brain, but their genesis, prevalence, forms, and biological impact remain unclear, even within experimentally tractable animal models. We combined a transposase-based amplification (TbA) methodology for single-cell whole-genome sequencing with a bioinformatic approach for filtering unreliable CNVs (FUnC), developed from machine learning trained on lymphocyte V(D)J recombination. TbA–FUnC offered superior genomic coverage and removed >90% of false-positive CNV calls, allowing extensive examination of submegabase CNVs from over 500 cells throughout the neurogenic period of cerebral cortical development in Mus musculus. Thousands of previously undocumented CNVs were identified. Half were less than 1 Mb in size, with deletions 4× more common than amplification events, and were randomly distributed throughout the genome. However, CNV prevalence during embryonic cortical development was nonrandom, peaking at midneurogenesis with levels triple those found at younger ages before falling to intermediate quantities. These data identify pervasive small and large CNVs as early contributors to neural genomic mosaicism, producing genomically diverse cellular building blocks that form the highly organized, mature brain.

Download Full-text