Detection of Copy Number Variation Regions Using the DNA-Sequencing Data from Multiple Profiles with Correlated Structure

Jie Chen; Shirong Deng

doi:10.1089/cmb.2018.0053

A distance-type measure approach to the analysis of copy number variation in DNA sequencing data

BMC Genomics ◽

10.1186/s12864-019-5491-x ◽

2019 ◽

Vol 20 (S2) ◽

Author(s):

Bipasa Biswas ◽

Yinglei Lai

Keyword(s):

Copy Number Variation ◽

Dna Sequencing ◽

Copy Number ◽

Sequencing Data ◽

Number Variation

Download Full-text

Insights into dispersed duplications and complex structural mutations from whole genome sequencing 706 families

10.1101/2020.08.03.235358 ◽

2020 ◽

Author(s):

Christopher W. Whelan ◽

Robert E. Handsaker ◽

Giulio Genovese ◽

Seva Kashin ◽

Monkol Lek ◽

...

Keyword(s):

Gene Expression ◽

Copy Number Variation ◽

Copy Number ◽

De Novo ◽

Whole Genome ◽

Sequencing Data ◽

Number Variation ◽

Structural Mutations ◽

Or Gene ◽

Genomic Locations

AbstractTwo intriguing forms of genome structural variation (SV) – dispersed duplications, and de novo rearrangements of complex, multi-allelic loci – have long escaped genomic analysis. We describe a new way to find and characterize such variation by utilizing identity-by-descent (IBD) relationships between siblings together with high-precision measurements of segmental copy number. Analyzing whole-genome sequence data from 706 families, we find hundreds of “IBD-discordant” (IBDD) CNVs: loci at which siblings’ CNV measurements and IBD states are mathematically inconsistent. We found that commonly-IBDD CNVs identify dispersed duplications; we mapped 95 of these common dispersed duplications to their true genomic locations through family-based linkage and population linkage disequilibrium (LD), and found several to be in strong LD with genome-wide association (GWAS) signals for common diseases or gene expression variation at their revealed genomic locations. Other CNVs that were IBDD in a single family appear to involve de novo mutations in complex and multi-allelic loci; we identified 26 de novo structural mutations that had not been previously detected in earlier analyses of the same families by diverse SV analysis methods. These included a de novo mutation of the amylase gene locus and multiple de novo mutations at chromosome 15q14. Combining these complex mutations with more-conventional CNVs, we estimate that segmental mutations larger than 1kb arise in about one per 22 human meioses. These methods are complementary to previous techniques in that they interrogate genomic regions that are home to segmental duplication, high CNV allele frequencies, and multi-allelic CNVs.Author SummaryCopy number variation is an important form of genetic variation in which individuals differ in the number of copies of segments of their genomes. Certain aspects of copy number variation have traditionally been difficult to study using short-read sequencing data. For example, standard analyses often cannot tell whether the duplicated copies of a segment are located near the original copy or are dispersed to other regions of the genome. Another aspect of copy number variation that has been difficult to study is the detection of mutations in the copy number of DNA segments passed down from parents to their children, particularly when the mutations affect genome segments which already display common copy number variation in the population. We develop an analytical approach to solving these problems when sequencing data is available for all members of families with at least two children. This method is based on determining the number of parental haplotypes the two siblings share at each location in their genome, and using that information to determine the possible inheritance patterns that might explain the copy numbers we observe in each family member. We show that dispersed duplications and mutations can be identified by looking for copy number variants that do not follow these expected inheritance patterns. We use this approach to determine the location of 95 common duplications which are dispersed to distant regions of the genome, and demonstrate that these duplications are linked to genetic variants that affect disease risk or gene expression levels. We also identify a set of copy number mutations not detected by previous analyses of sequencing data from a large cohort of families, and show that repetitive and complex regions of the genome undergo frequent mutations in copy number.

Download Full-text

A Deep Learning Approach for Detecting Copy Number Variation in Next-Generation Sequencing Data

G3 Genes|Genome|Genetics ◽

10.1534/g3.119.400596 ◽

2019 ◽

Vol 9 (11) ◽

pp. 3575-3582 ◽

Cited By ~ 5

Author(s):

Tom Hill ◽

Robert L. Unckless

Keyword(s):

Deep Learning ◽

Next Generation Sequencing ◽

Copy Number Variation ◽

Copy Number ◽

Next Generation Sequencing Data ◽

Learning Approach ◽

Next Generation ◽

Sequencing Data ◽

Number Variation ◽

Generation Sequencing

Download Full-text

Structural genome analysis in cultivated potato taxa

Theoretical and Applied Genetics ◽

10.1007/s00122-019-03519-6 ◽

2019 ◽

Vol 133 (3) ◽

pp. 951-966 ◽

Cited By ~ 3

Author(s):

Maria Kyriakidou ◽

Sai Reddy Achakkagari ◽

José Héctor Gálvez López ◽

Xinyi Zhu ◽

Chen Yu Tang ◽

...

Keyword(s):

Copy Number Variation ◽

Copy Number ◽

Agronomic Traits ◽

Genomic Variation ◽

Sequencing Data ◽

Structural Variations ◽

Structural Genomic ◽

Ploidy Levels ◽

Cultivated Potato ◽

Number Variation

Abstract Key message Twelve potato accessions were selected to represent two principal views on potato taxonomy. The genomes were sequenced and analyzed for structural variation (copy number variation) against three published potato genomes. Abstract The common potato (Solanum tuberosum L.) is an important staple crop with a highly heterozygous and complex tetraploid genome. The other taxa of cultivated potato contain varying ploidy levels (2X–5X), and structural variations are common in the genomes of these species, likely contributing to the diversification or agronomic traits during domestication. Increased understanding of the genomes and genomic variation will aid in the exploration of novel agronomic traits. Thus, sequencing data from twelve potato landraces, representing the four ploidy levels, were used to identify structural genomic variation compared to the two currently available reference genomes, a double monoploid potato genome and a diploid inbred clone of S. chacoense. The results of a copy number variation analysis showed that in the majority of the genomes, while the number of deletions is greater than the number of duplications, the number of duplicated genes is greater than the number of deleted ones. Specific regions in the twelve potato genomes have a high density of CNV events. Further, the auxin-induced SAUR genes (involved in abiotic stress), disease resistance genes and the 2-oxoglutarate/Fe(II)-dependent oxygenase superfamily proteins, among others, had increased copy numbers in these sequenced genomes relative to the references.

Download Full-text

Erratum to: CoNVEX: copy number variation estimation in exome sequencing data using HMM

BMC Bioinformatics ◽

10.1186/1471-2105-14-s2-s26 ◽

2013 ◽

Vol 14 (S2) ◽

Cited By ~ 2

Author(s):

Kaushalya C Amarasinghe ◽

Jason Li ◽

Saman K Halgamuge

Keyword(s):

Copy Number Variation ◽

Exome Sequencing ◽

Copy Number ◽

Sequencing Data ◽

Exome Sequencing Data ◽

Number Variation

Download Full-text

CoDEX2: full-spectrum copy number variation detection by high-throughput DNA sequencing

10.1101/211698 ◽

2017 ◽

Cited By ~ 1

Author(s):

Yuchao Jiang ◽

Rujin Wang ◽

Eugene Urrutia ◽

Ioannis N. Anastopoulos ◽

Katherine L. Nathanson ◽

...

Keyword(s):

Dna Sequencing ◽

High Throughput ◽

Copy Number ◽

Copy Number Variations ◽

Negative Control ◽

Sequencing Data ◽

Full Spectrum ◽

Number Variation ◽

High Throughput Dna Sequencing ◽

Low Sensitivity

AbstractHigh-throughput DNA sequencing enables detection of copy number variations (CNVs) on the genome-wide scale with finer resolution compared to array-based methods, but suffers from biases and artifacts that lead to false discoveries and low sensitivity. We describe CODEX2, a statistical framework for full-spectrum CNV profiling that is sensitive for variants with both common and rare population frequencies and that is applicable to study designs with and without negative control samples. We demonstrate and evaluate CODEX2 on whole-exome and targeted sequencing data, where biases are the most prominent. CODEX2 outperforms existing methods and, in particular, significantly improves sensitivity for common CNVs.

Download Full-text

CovCopCan: An efficient tool to detect Copy Number Variation from amplicon sequencing data in inherited diseases and cancer

PLoS Computational Biology ◽

10.1371/journal.pcbi.1007503 ◽

2020 ◽

Vol 16 (2) ◽

pp. e1007503 ◽

Cited By ~ 1

Author(s):

Paco Derouault ◽

Jasmine Chauzeix ◽

David Rizzo ◽

Federica Miressi ◽

Corinne Magdelaine ◽

...

Keyword(s):

Copy Number Variation ◽

Copy Number ◽

Amplicon Sequencing ◽

Sequencing Data ◽

Efficient Tool ◽

Inherited Diseases ◽

Number Variation

Download Full-text

ExomeHMM: A Hidden Markov Model for Detecting Copy Number Variation Using Whole-Exome Sequencing Data

Current Bioinformatics ◽

10.2174/1574893611666160727160757 ◽

2017 ◽

Vol 12 (2) ◽

pp. 147-155 ◽

Cited By ~ 2

Author(s):

Ao Li ◽

Minghui Wang ◽

Zhenhua Yu ◽

Cheng Guo

Keyword(s):

Copy Number Variation ◽

Markov Model ◽

Exome Sequencing ◽

Copy Number ◽

Hidden Markov ◽

Sequencing Data ◽

Exome Sequencing Data ◽

Whole Exome ◽

Whole Exome Sequencing Data ◽

Number Variation

Download Full-text

Anaconda: AN automated pipeline for somatic COpy Number variation Detection and Annotation from tumor exome sequencing data

BMC Bioinformatics ◽

10.1186/s12859-017-1833-3 ◽

2017 ◽

Vol 18 (1) ◽

Cited By ~ 5

Author(s):

Jianing Gao ◽

Changlin Wan ◽

Huan Zhang ◽

Ao Li ◽

Qiguang Zang ◽

...

Keyword(s):

Copy Number Variation ◽

Exome Sequencing ◽

Copy Number ◽

Sequencing Data ◽

Exome Sequencing Data ◽

Number Variation ◽

Automated Pipeline ◽

Copy Number Variation Detection

Download Full-text

Comparative Study of Exome Copy Number Variation Estimation Tools Using Array Comparative Genomic Hybridization as Control

BioMed Research International ◽

10.1155/2013/915636 ◽

2013 ◽

Vol 2013 ◽

pp. 1-7 ◽

Cited By ~ 23

Author(s):

Yan Guo ◽

Quanghu Sheng ◽

David C. Samuels ◽

Brian Lehmann ◽

Joshua A. Bauer ◽

...

Keyword(s):

Copy Number Variation ◽

Exome Sequencing ◽

Copy Number ◽

Array Cgh ◽

False Positive Rate ◽

Comparative Genomic ◽

Comparative Genome Hybridization ◽

Sequencing Data ◽

Number Variation ◽

Low Sensitivity

Exome sequencing using next-generation sequencing technologies is a cost-efficient approach to selectively sequencing coding regions of the human genome for detection of disease variants. One of the lesser known yet important applications of exome sequencing data is to identify copy number variation (CNV). There have been many exome CNV tools developed over the last few years, but the performance and accuracy of these programs have not been thoroughly evaluated. In this study, we systematically compared four popular exome CNV tools (CoNIFER, cn.MOPS, exomeCopy, and ExomeDepth) and evaluated their effectiveness against array comparative genome hybridization (array CGH) platforms. We found that exome CNV tools are capable of identifying CNVs, but they can have problems such as high false positives, low sensitivity, and duplication bias when compared to array CGH platforms. While exome CNV tools do serve their purpose for data mining, careful evaluation and additional validation is highly recommended. Based on all these results, we recommend CoNIFER and cn.MOPs for nonpaired exome CNV detection over the other two tools due to a low false-positive rate, although none of the four exome CNV tools performed at an outstanding level when compared to array CGH.

Download Full-text