Population-wide copy number variation calling using variant call format files from 6,898 individuals

Mapping Intimacies ◽

10.1101/504209 ◽

2018 ◽

Author(s):

Grace Png ◽

Daniel Suveges ◽

Young-Chan Park ◽

Klaudia Walter ◽

Kousik Kundu ◽

...

Keyword(s):

Copy Number Variation ◽

Copy Number ◽

Copy Number Variants ◽

Regression Tree ◽

Low Frequency ◽

Protein Product ◽

Supplementary Information ◽

Variant Call ◽

Large Deletions ◽

Number Variation

MotivationCopy number variants (CNVs) are large deletions or duplications at least 50 to 200 base pairs long. They play an important role in multiple disorders, but accurate calling of CNVs remains challenging. Most current approaches to CNV detection use raw read alignments, which are computationally intensive to process.ResultsWe use a regression tree-based approach to call CNVs from whole-genome sequencing (WGS, > 18x) variant call-sets in 6,898 samples across four European cohorts, and describe a rich large variation landscape comprising 1,320 CNVs. 61.8% of detected events have been previously reported in the Database of Genomic Variants. 23% of high-quality deletions affect entire genes, and we recapitulate known events such as theGSTM1andRHDgene deletions. We test for association between the detected deletions and 275 protein levels in 1,457 individuals to assess the potential clinical impact of the detected CNVs. We describe the LD structure and copy number variation underlying the association between levels of the CCL3 protein and a complex structural variant (MAF = 0.15, p = 3.6×10-12) affectingCCL3L3, a paralog of theCCL3gene. We also identify acis-association between a low-frequencyNOMO1deletion and the protein product of this gene (MAF = 0.02, p = 2.2×10-7), for which nocis-ortrans-single nucleotide variant-driven protein quantitative trait locus (pQTL) has been documented to date. This work demonstrates that existing population-wide WGS call-sets can be mined for CNVs with minimal computational overhead, delivering insight into a less well-studied, yet potentially impactful class of genetic variant.AvailabilityThe regression tree based approach, UN-CNVc, is available as an R and bash executable on GitHub athttps://github.com/agilly/[email protected];[email protected] InformationSupplementary information is appended.

Download Full-text

An accurate and powerful method for copy number variation detection

Bioinformatics ◽

10.1093/bioinformatics/bty1041 ◽

2019 ◽

Vol 35 (17) ◽

pp. 2891-2898

Author(s):

Feifei Xiao ◽

Xizhi Luo ◽

Ning Hao ◽

Yue S Niu ◽

Xiangjun Xiao ◽

...

Keyword(s):

Copy Number Variation ◽

Complex Traits ◽

Copy Number ◽

Statistical Power ◽

Copy Number Variants ◽

Supplementary Information ◽

External Information ◽

Number Variation ◽

Normal Mean ◽

Copy Number Variation Detection

Abstract Motivation Integration of multiple genetic sources for copy number variation detection (CNV) is a powerful approach to improve the identification of variants associated with complex traits. Although it has been shown that the widely used change point based methods can increase statistical power to identify variants, it remains challenging to effectively detect CNVs with weak signals due to the noisy nature of genotyping intensity data. We previously developed modSaRa, a normal mean-based model on a screening and ranking algorithm for copy number variation identification which presented desirable sensitivity with high computational efficiency. To boost statistical power for the identification of variants, here we present a novel improvement that integrates the relative allelic intensity with external information from empirical statistics with modeling, which we called modSaRa2. Results Simulation studies illustrated that modSaRa2 markedly improved both sensitivity and specificity over existing methods for analyzing array-based data. The improvement in weak CNV signal detection is the most substantial, while it also simultaneously improves stability when CNV size varies. The application of the new method to a whole genome melanoma dataset identified novel candidate melanoma risk associated deletions on chromosome bands 1p22.2 and duplications on 6p22, 6q25 and 19p13 regions, which may facilitate the understanding of the possible roles of germline copy number variants in the etiology of melanoma. Availability and implementation http://c2s2.yale.edu/software/modSaRa2 or https://github.com/FeifeiXiaoUSC/modSaRa2. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

CNV-BAC: Copy number Variation Detection in Bacterial Circular Genome

Bioinformatics ◽

10.1093/bioinformatics/btaa208 ◽

2020 ◽

Vol 36 (12) ◽

pp. 3890-3891

Author(s):

Linjie Wu ◽

Han Wang ◽

Yuchao Xia ◽

Ruibin Xi

Keyword(s):

Copy Number Variation ◽

Copy Number ◽

Genome Structure ◽

Real Data ◽

Read Depth ◽

Supplementary Information ◽

Circular Genome ◽

Number Variation ◽

Copy Number Variation Detection ◽

Cnv Detection

Abstract Motivation Whole-genome sequencing (WGS) is widely used for copy number variation (CNV) detection. However, for most bacteria, their circular genome structure and high replication rate make reads more enriched near the replication origin. CNV detection based on read depth could be seriously influenced by such replication bias. Results We show that the replication bias is widespread using ∼200 bacterial WGS data. We develop CNV-BAC (CNV-Bacteria) that can properly normalize the replication bias and other known biases in bacterial WGS data and can accurately detect CNVs. Simulation and real data analysis show that CNV-BAC achieves the best performance in CNV detection compared with available algorithms. Availability and implementation CNV-BAC is available at https://github.com/XiDsLab/CNV-BAC. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Genome-wide Association of Copy-Number Variation Reveals an Association between Short Stature and the Presence of Low-Frequency Genomic Deletions

The American Journal of Human Genetics ◽

10.1016/j.ajhg.2011.10.014 ◽

2011 ◽

Vol 89 (6) ◽

pp. 751-759 ◽

Cited By ~ 45

Author(s):

Andrew Dauber ◽

Yongguo Yu ◽

Michael C. Turchin ◽

Charleston W. Chiang ◽

Yan A. Meng ◽

...

Keyword(s):

Copy Number Variation ◽

Short Stature ◽

Copy Number ◽

Low Frequency ◽

Genome Wide Association ◽

Genomic Deletions ◽

Genome Wide ◽

Number Variation

Download Full-text

MONTAGE: a new tool for high-throughput detection of mosaic copy number variation

BMC Genomics ◽

10.1186/s12864-021-07395-7 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Joseph T. Glessner ◽

Xiao Chang ◽

Yichuan Liu ◽

Jin Li ◽

Munir Khan ◽

...

Keyword(s):

Copy Number Variation ◽

High Throughput ◽

Copy Number ◽

Copy Number Variants ◽

Cell Types ◽

Disease Phenotype ◽

Multiple Phenotypes ◽

Genome Wide ◽

Number Variation ◽

The Impact

Abstract Background Not all cells in a given individual are identical in their genomic makeup. Mosaicism describes such a phenomenon where a mixture of genotypic states in certain genomic segments exists within the same individual. Mosaicism is a prevalent and impactful class of non-integer state copy number variation (CNV). Mosaicism implies that certain cell types or subset of cells contain a CNV in a segment of the genome while other cells in the same individual do not. Several studies have investigated the impact of mosaicism in single patients or small cohorts but no comprehensive scan of mosaic CNVs has been undertaken to accurately detect such variants and interpret their impact on human health and disease. Results We developed a tool called Montage to improve the accuracy of detection of mosaic copy number variants in a high throughput fashion. Montage directly interfaces with ParseCNV2 algorithm to establish disease phenotype genome-wide association and determine which genomic ranges had more or less than expected frequency of mosaic events. We screened for mosaic events in over 350,000 samples using 1% allele frequency as the detection limit. Additionally, we uncovered disease associations of multiple phenotypes with mosaic CNVs at several genomic loci. We additionally investigated the allele imbalance observations genome-wide to define non-diploid and non-integer copy number states. Conclusions Our novel algorithm presents an efficient tool with fast computational runtime and high levels of accuracy of mosaic CNV detection. A curated mosaic CNV callset of 3716 events in 2269 samples is presented with comparability to previous reports and disease phenotype associations. The new algorithm can be freely accessed via: https://github.com/CAG-CNV/MONTAGE.

Download Full-text

Copy number variation and neurodevelopmental problems in females and males in the general population

10.1101/236042 ◽

2017 ◽

Cited By ~ 1

Author(s):

Joanna Martin ◽

Kristiina Tammimies ◽

Robert Karlsson ◽

Yi Lu ◽

Henrik Larsson ◽

...

Keyword(s):

General Population ◽

Copy Number ◽

Population Sample ◽

Copy Number Variants ◽

Rare Cnvs ◽

Large Deletions ◽

Number Variation ◽

Twin Children ◽

Anxiety Depression ◽

Neurodevelopmental Problems

AbstractObjectiveNeurodevelopmental problems (NPs) are childhood phenotypes that are more common in males. Conversely, anxiety and depression (which are frequently comorbid with NPs) are more common in females. Rare copy number variants (CNVs) have been implicated in clinically-defined NPs. Here, we aimed to characterise the relationship between rare CNVs with NPs and anxiety/depression in a population sample of twin children. Additionally, we examined whether sex-specific CNV effects underlie the sex bias of these disorders.MethodWe analysed a sample of N=12,982 children, of whom 5.3% had narrowly-defined NPs (clinically-diagnosed), 20.9% had broadly-defined NPs (based on validated screening measures, but no diagnosis) and 3.0% had clinically-diagnosed anxiety or depression. Rare (<1% frequency) CNVs were categorised by size (medium: 100-500kb or large: >500kb), type (duplication or deletion) and putative relevance to NPs (affecting previously implicated loci or evolutionarily-constrained genes). We tested for associations between the different CNV categories with NPs and anxiety/depression, followed by examination of sex-specific effects.ResultsMedium deletions (OR(CI)=1.18(1.05-1.33),p=0.0053) and large duplications (OR(CI)=1.45(1.19-1.75),p=0.00017) were associated with broadly-defined NPs. Large deletions (OR(CI)=1.85(1.14-3.01),p=0.013) were associated with narrowly-defined NPs. The effect sizes increased for large NP-relevant CNVs (broadly-defined: OR(CI)=1.60(1.06-2.42),p=0.025; narrowly-defined: OR(CI)=3.64(2.16-6.13),p=1.2E-6). No sex differences in CNV burden were found in individuals with NPs (p>0.05). In individuals diagnosed with anxiety or depression, females were more likely to have large CNVs (OR(CI)=3.75(1.45-9.68),p=0.0064).ConclusionRare CNVs are significantly associated with both narrowly- and broadly-defined NPs in a general population sample of children. Our results also suggest that large, rare CNVs may show sex-specific phenotypic effects.

Download Full-text

Gene copy-number polymorphism in nature

Proceedings of The Royal Society B Biological Sciences ◽

10.1098/rspb.2010.1180 ◽

2010 ◽

Vol 277 (1698) ◽

pp. 3213-3221 ◽

Cited By ~ 97

Author(s):

Daniel R. Schrider ◽

Matthew W. Hahn

Keyword(s):

Natural Selection ◽

Copy Number Variation ◽

Copy Number ◽

Molecular Mechanisms ◽

Copy Number Variants ◽

Gene Copy Number ◽

Gene Copy ◽

Phenotypic Differences ◽

Sequencing Technologies ◽

Number Variation

Differences between individuals in the copy-number of whole genes have been found in every multicellular species examined thus far. Such differences result in unique complements of protein-coding genes in all individuals, and have been shown to underlie adaptive phenotypic differences. Here, we review the evidence for copy-number variants (CNVs), focusing on the methods used to detect them and the molecular mechanisms responsible for generating this type of variation. Although there are multiple technical and computational challenges inherent to these experimental methods, next-generation sequencing technologies are making such experiments accessible in any system with a sequenced genome. We further discuss the connection between copy-number variation within species and copy-number divergence between species, showing that these values are exactly what one would expect from similar comparisons of nucleotide polymorphism and divergence. We conclude by reviewing the growing body of evidence for natural selection on copy-number variants. While it appears that most genic CNVs—especially deletions—are quickly eliminated by selection, there are now multiple studies demonstrating a strong link between copy-number differences at specific genes and phenotypic differences in adaptive traits. We argue that a complete understanding of the molecular basis for adaptive natural selection necessarily includes the study of copy-number variation.

Download Full-text

A comprehensive analysis of copy number variation in a Turkish dementia cohort

Human Genomics ◽

10.1186/s40246-021-00346-z ◽

2021 ◽

Vol 15 (1) ◽

Author(s):

Nadia Dehghani ◽

Gamze Guven ◽

Celia Kun-Rodrigues ◽

Catarina Gouveia ◽

Kalina Foster ◽

...

Keyword(s):

Copy Number Variation ◽

Copy Number ◽

Copy Number Variants ◽

Olfactory Receptors ◽

Chromosome 9 ◽

Systematic Analysis ◽

Family Analysis ◽

Number Variation ◽

The Uk ◽

Genomic Regions

Abstract Background Copy number variants (CNVs) include deletions or multiplications spanning genomic regions. These regions vary in size and may span genes known to play a role in human diseases. As examples, duplications and triplications of SNCA have been shown to cause forms of Parkinson’s disease, while duplications of APP cause early onset Alzheimer’s disease (AD). Results Here, we performed a systematic analysis of CNVs in a Turkish dementia cohort in order to further characterize the genetic causes of dementia in this population. One hundred twenty-four Turkish individuals, either at risk of dementia due to family history, diagnosed with mild cognitive impairment, AD, or frontotemporal dementia, were whole-genome genotyped and CNVs were detected. We integrated family analysis with a comprehensive assessment of potentially disease-associated CNVs in this Turkish dementia cohort. We also utilized both dementia and non-dementia individuals from the UK Biobank in order to further elucidate the potential role of the identified CNVs in neurodegenerative diseases. We report CNVs overlapping the previously implicated genes ZNF804A, SNORA70B, USP34, XPO1, and a locus on chromosome 9 which includes a cluster of olfactory receptors and ABCA1. Additionally, we also describe novel CNVs potentially associated with dementia, overlapping the genes AFG1L, SNX3, VWDE, and BC039545. Conclusions Genotyping data from understudied populations can be utilized to identify copy number variation which may contribute to dementia.

Download Full-text

Population‐wide copy number variation calling using variant call format files from 6,898 individuals

Genetic Epidemiology ◽

10.1002/gepi.22260 ◽

2019 ◽

Vol 44 (1) ◽

pp. 79-89

Author(s):

Grace Png ◽

Daniel Suveges ◽

Young‐Chan Park ◽

Klaudia Walter ◽

Kousik Kundu ◽

...

Keyword(s):

Copy Number Variation ◽

Copy Number ◽

Variant Call Format ◽

Variant Call ◽

Number Variation ◽

Copy Number Variation Calling

Download Full-text

Chemical Exposure Generates DNA Copy Number Variants and Impacts Gene Expression

Advances in Toxicology ◽

10.1155/2014/984319 ◽

2014 ◽

Vol 2014 ◽

pp. 1-13 ◽

Cited By ~ 6

Author(s):

Samuel M. Peterson ◽

Jennifer L. Freeman

Keyword(s):

Gene Expression ◽

Copy Number Variation ◽

Copy Number ◽

Copy Number Variants ◽

Complex Diseases ◽

Chemical Exposure ◽

Environmental Chemicals ◽

Comparative Genomic ◽

Dna Copy Number ◽

Number Variation

DNA copy number variation is long associated with highly penetrant genomic disorders, but it was not until recently that the widespread occurrence of copy number variation among phenotypically normal individuals was realized as a considerable source of genetic variation. It is also now appreciated that copy number variants (CNVs) play a role in the onset of complex diseases. Many of the complex diseases in which CNVs are associated are reported to be influenced by yet to be identified environmental factors. It is hypothesized that exposure to environmental chemicals generates CNVs and influences disease onset and pathogenesis. In this study a proof of principle experiment was completed with ethyl methanesulfonate (EMS) and cytosine arabinoside (Ara-C) to investigate the generation of CNVs using array comparative genomic hybridization (CGH) and the zebrafish vertebrate model system. Exposure to both chemicals resulted in CNVs. CNVs were detected in similar genomic regions among multiple exposure concentrations with EMS and five CNVs were common among both chemicals. Furthermore, CNVs were correlated to altered gene expression. This study suggests that chemical exposure generates CNVs with impacts on gene expression warranting further investigation of this phenomenon with environmental chemicals.

Download Full-text

Identification of Novel Germline and Tumor-Specific Nucleotide Variants and Copy Number Variation in Clival Chordomas by Exome Sequencing

Journal of Neurological Surgery Part B Skull Base ◽

10.1055/s-0035-1546555 ◽

2015 ◽

Vol 76 (S 01) ◽

Author(s):

Georgios Zenonos ◽

Peter Howard ◽

Maureen Lyons-Weiler ◽

Wang Eric ◽

William LaFambroise ◽

...

Keyword(s):

Copy Number Variation ◽

Exome Sequencing ◽

Copy Number ◽

Number Variation ◽

Specific Nucleotide

Download Full-text