scholarly journals Population-wide copy number variation calling using variant call format files from 6,898 individuals

2018 ◽  
Author(s):  
Grace Png ◽  
Daniel Suveges ◽  
Young-Chan Park ◽  
Klaudia Walter ◽  
Kousik Kundu ◽  
...  

MotivationCopy number variants (CNVs) are large deletions or duplications at least 50 to 200 base pairs long. They play an important role in multiple disorders, but accurate calling of CNVs remains challenging. Most current approaches to CNV detection use raw read alignments, which are computationally intensive to process.ResultsWe use a regression tree-based approach to call CNVs from whole-genome sequencing (WGS, > 18x) variant call-sets in 6,898 samples across four European cohorts, and describe a rich large variation landscape comprising 1,320 CNVs. 61.8% of detected events have been previously reported in the Database of Genomic Variants. 23% of high-quality deletions affect entire genes, and we recapitulate known events such as theGSTM1andRHDgene deletions. We test for association between the detected deletions and 275 protein levels in 1,457 individuals to assess the potential clinical impact of the detected CNVs. We describe the LD structure and copy number variation underlying the association between levels of the CCL3 protein and a complex structural variant (MAF = 0.15, p = 3.6×10-12) affectingCCL3L3, a paralog of theCCL3gene. We also identify acis-association between a low-frequencyNOMO1deletion and the protein product of this gene (MAF = 0.02, p = 2.2×10-7), for which nocis-ortrans-single nucleotide variant-driven protein quantitative trait locus (pQTL) has been documented to date. This work demonstrates that existing population-wide WGS call-sets can be mined for CNVs with minimal computational overhead, delivering insight into a less well-studied, yet potentially impactful class of genetic variant.AvailabilityThe regression tree based approach, UN-CNVc, is available as an R and bash executable on GitHub athttps://github.com/agilly/[email protected];[email protected] InformationSupplementary information is appended.

2019 ◽  
Vol 35 (17) ◽  
pp. 2891-2898
Author(s):  
Feifei Xiao ◽  
Xizhi Luo ◽  
Ning Hao ◽  
Yue S Niu ◽  
Xiangjun Xiao ◽  
...  

Abstract Motivation Integration of multiple genetic sources for copy number variation detection (CNV) is a powerful approach to improve the identification of variants associated with complex traits. Although it has been shown that the widely used change point based methods can increase statistical power to identify variants, it remains challenging to effectively detect CNVs with weak signals due to the noisy nature of genotyping intensity data. We previously developed modSaRa, a normal mean-based model on a screening and ranking algorithm for copy number variation identification which presented desirable sensitivity with high computational efficiency. To boost statistical power for the identification of variants, here we present a novel improvement that integrates the relative allelic intensity with external information from empirical statistics with modeling, which we called modSaRa2. Results Simulation studies illustrated that modSaRa2 markedly improved both sensitivity and specificity over existing methods for analyzing array-based data. The improvement in weak CNV signal detection is the most substantial, while it also simultaneously improves stability when CNV size varies. The application of the new method to a whole genome melanoma dataset identified novel candidate melanoma risk associated deletions on chromosome bands 1p22.2 and duplications on 6p22, 6q25 and 19p13 regions, which may facilitate the understanding of the possible roles of germline copy number variants in the etiology of melanoma. Availability and implementation http://c2s2.yale.edu/software/modSaRa2 or https://github.com/FeifeiXiaoUSC/modSaRa2. Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Vol 36 (12) ◽  
pp. 3890-3891
Author(s):  
Linjie Wu ◽  
Han Wang ◽  
Yuchao Xia ◽  
Ruibin Xi

Abstract Motivation Whole-genome sequencing (WGS) is widely used for copy number variation (CNV) detection. However, for most bacteria, their circular genome structure and high replication rate make reads more enriched near the replication origin. CNV detection based on read depth could be seriously influenced by such replication bias. Results We show that the replication bias is widespread using ∼200 bacterial WGS data. We develop CNV-BAC (CNV-Bacteria) that can properly normalize the replication bias and other known biases in bacterial WGS data and can accurately detect CNVs. Simulation and real data analysis show that CNV-BAC achieves the best performance in CNV detection compared with available algorithms. Availability and implementation CNV-BAC is available at https://github.com/XiDsLab/CNV-BAC. Supplementary information Supplementary data are available at Bioinformatics online.


BMC Genomics ◽  
2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Joseph T. Glessner ◽  
Xiao Chang ◽  
Yichuan Liu ◽  
Jin Li ◽  
Munir Khan ◽  
...  

Abstract Background Not all cells in a given individual are identical in their genomic makeup. Mosaicism describes such a phenomenon where a mixture of genotypic states in certain genomic segments exists within the same individual. Mosaicism is a prevalent and impactful class of non-integer state copy number variation (CNV). Mosaicism implies that certain cell types or subset of cells contain a CNV in a segment of the genome while other cells in the same individual do not. Several studies have investigated the impact of mosaicism in single patients or small cohorts but no comprehensive scan of mosaic CNVs has been undertaken to accurately detect such variants and interpret their impact on human health and disease. Results We developed a tool called Montage to improve the accuracy of detection of mosaic copy number variants in a high throughput fashion. Montage directly interfaces with ParseCNV2 algorithm to establish disease phenotype genome-wide association and determine which genomic ranges had more or less than expected frequency of mosaic events. We screened for mosaic events in over 350,000 samples using 1% allele frequency as the detection limit. Additionally, we uncovered disease associations of multiple phenotypes with mosaic CNVs at several genomic loci. We additionally investigated the allele imbalance observations genome-wide to define non-diploid and non-integer copy number states. Conclusions Our novel algorithm presents an efficient tool with fast computational runtime and high levels of accuracy of mosaic CNV detection. A curated mosaic CNV callset of 3716 events in 2269 samples is presented with comparability to previous reports and disease phenotype associations. The new algorithm can be freely accessed via: https://github.com/CAG-CNV/MONTAGE.


2017 ◽  
Author(s):  
Joanna Martin ◽  
Kristiina Tammimies ◽  
Robert Karlsson ◽  
Yi Lu ◽  
Henrik Larsson ◽  
...  

AbstractObjectiveNeurodevelopmental problems (NPs) are childhood phenotypes that are more common in males. Conversely, anxiety and depression (which are frequently comorbid with NPs) are more common in females. Rare copy number variants (CNVs) have been implicated in clinically-defined NPs. Here, we aimed to characterise the relationship between rare CNVs with NPs and anxiety/depression in a population sample of twin children. Additionally, we examined whether sex-specific CNV effects underlie the sex bias of these disorders.MethodWe analysed a sample of N=12,982 children, of whom 5.3% had narrowly-defined NPs (clinically-diagnosed), 20.9% had broadly-defined NPs (based on validated screening measures, but no diagnosis) and 3.0% had clinically-diagnosed anxiety or depression. Rare (<1% frequency) CNVs were categorised by size (medium: 100-500kb or large: >500kb), type (duplication or deletion) and putative relevance to NPs (affecting previously implicated loci or evolutionarily-constrained genes). We tested for associations between the different CNV categories with NPs and anxiety/depression, followed by examination of sex-specific effects.ResultsMedium deletions (OR(CI)=1.18(1.05-1.33),p=0.0053) and large duplications (OR(CI)=1.45(1.19-1.75),p=0.00017) were associated with broadly-defined NPs. Large deletions (OR(CI)=1.85(1.14-3.01),p=0.013) were associated with narrowly-defined NPs. The effect sizes increased for large NP-relevant CNVs (broadly-defined: OR(CI)=1.60(1.06-2.42),p=0.025; narrowly-defined: OR(CI)=3.64(2.16-6.13),p=1.2E-6). No sex differences in CNV burden were found in individuals with NPs (p>0.05). In individuals diagnosed with anxiety or depression, females were more likely to have large CNVs (OR(CI)=3.75(1.45-9.68),p=0.0064).ConclusionRare CNVs are significantly associated with both narrowly- and broadly-defined NPs in a general population sample of children. Our results also suggest that large, rare CNVs may show sex-specific phenotypic effects.


2010 ◽  
Vol 277 (1698) ◽  
pp. 3213-3221 ◽  
Author(s):  
Daniel R. Schrider ◽  
Matthew W. Hahn

Differences between individuals in the copy-number of whole genes have been found in every multicellular species examined thus far. Such differences result in unique complements of protein-coding genes in all individuals, and have been shown to underlie adaptive phenotypic differences. Here, we review the evidence for copy-number variants (CNVs), focusing on the methods used to detect them and the molecular mechanisms responsible for generating this type of variation. Although there are multiple technical and computational challenges inherent to these experimental methods, next-generation sequencing technologies are making such experiments accessible in any system with a sequenced genome. We further discuss the connection between copy-number variation within species and copy-number divergence between species, showing that these values are exactly what one would expect from similar comparisons of nucleotide polymorphism and divergence. We conclude by reviewing the growing body of evidence for natural selection on copy-number variants. While it appears that most genic CNVs—especially deletions—are quickly eliminated by selection, there are now multiple studies demonstrating a strong link between copy-number differences at specific genes and phenotypic differences in adaptive traits. We argue that a complete understanding of the molecular basis for adaptive natural selection necessarily includes the study of copy-number variation.


2021 ◽  
Vol 15 (1) ◽  
Author(s):  
Nadia Dehghani ◽  
Gamze Guven ◽  
Celia Kun-Rodrigues ◽  
Catarina Gouveia ◽  
Kalina Foster ◽  
...  

Abstract Background Copy number variants (CNVs) include deletions or multiplications spanning genomic regions. These regions vary in size and may span genes known to play a role in human diseases. As examples, duplications and triplications of SNCA have been shown to cause forms of Parkinson’s disease, while duplications of APP cause early onset Alzheimer’s disease (AD). Results Here, we performed a systematic analysis of CNVs in a Turkish dementia cohort in order to further characterize the genetic causes of dementia in this population. One hundred twenty-four Turkish individuals, either at risk of dementia due to family history, diagnosed with mild cognitive impairment, AD, or frontotemporal dementia, were whole-genome genotyped and CNVs were detected. We integrated family analysis with a comprehensive assessment of potentially disease-associated CNVs in this Turkish dementia cohort. We also utilized both dementia and non-dementia individuals from the UK Biobank in order to further elucidate the potential role of the identified CNVs in neurodegenerative diseases. We report CNVs overlapping the previously implicated genes ZNF804A, SNORA70B, USP34, XPO1, and a locus on chromosome 9 which includes a cluster of olfactory receptors and ABCA1. Additionally, we also describe novel CNVs potentially associated with dementia, overlapping the genes AFG1L, SNX3, VWDE, and BC039545. Conclusions Genotyping data from understudied populations can be utilized to identify copy number variation which may contribute to dementia.


2019 ◽  
Vol 44 (1) ◽  
pp. 79-89
Author(s):  
Grace Png ◽  
Daniel Suveges ◽  
Young‐Chan Park ◽  
Klaudia Walter ◽  
Kousik Kundu ◽  
...  

2014 ◽  
Vol 2014 ◽  
pp. 1-13 ◽  
Author(s):  
Samuel M. Peterson ◽  
Jennifer L. Freeman

DNA copy number variation is long associated with highly penetrant genomic disorders, but it was not until recently that the widespread occurrence of copy number variation among phenotypically normal individuals was realized as a considerable source of genetic variation. It is also now appreciated that copy number variants (CNVs) play a role in the onset of complex diseases. Many of the complex diseases in which CNVs are associated are reported to be influenced by yet to be identified environmental factors. It is hypothesized that exposure to environmental chemicals generates CNVs and influences disease onset and pathogenesis. In this study a proof of principle experiment was completed with ethyl methanesulfonate (EMS) and cytosine arabinoside (Ara-C) to investigate the generation of CNVs using array comparative genomic hybridization (CGH) and the zebrafish vertebrate model system. Exposure to both chemicals resulted in CNVs. CNVs were detected in similar genomic regions among multiple exposure concentrations with EMS and five CNVs were common among both chemicals. Furthermore, CNVs were correlated to altered gene expression. This study suggests that chemical exposure generates CNVs with impacts on gene expression warranting further investigation of this phenomenon with environmental chemicals.


2015 ◽  
Vol 76 (S 01) ◽  
Author(s):  
Georgios Zenonos ◽  
Peter Howard ◽  
Maureen Lyons-Weiler ◽  
Wang Eric ◽  
William LaFambroise ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document