scholarly journals Bamgineer: Introduction of simulated allele-specific copy number variants into exome and targeted sequence data sets

2018 ◽  
Vol 14 (3) ◽  
pp. e1006080 ◽  
Author(s):  
Soroush Samadian ◽  
Jeff P. Bruce ◽  
Trevor J. Pugh
2017 ◽  
Author(s):  
Soroush Samadian ◽  
Jeff P. Bruce ◽  
Trevor J. Pugh

AbstractSomatic copy number variations (CNVs) play a crucial role in development of many human cancers. The broad availability of next-generation sequencing data has enabled the development of algorithms to computationally infer CNV profiles from a variety of data types including exome and targeted sequence data; currently the most prevalent types of cancer genomics data. However, systemic evaluation and comparison of these tools remains challenging due to a lack of ground truth reference sets. To address this need, we have developed Bamgineer, a tool written in Python to introduce user-defined haplotype-phased allele-specific copy number events into an existing Binary Alignment Mapping (BAM) file, with a focus on targeted and exome sequencing experiments. As input, this tool requires a read alignment file (BAM format), lists of non-overlapping genome coordinates for introduction of gains and losses (bed file), and an optional file defining known haplotypes (vcf format). To improve runtime performance, Bamgineer introduces the desired CNVs in parallel using queuing and parallel processing on a local machine or on a high-performance computing cluster. As proof-of-principle, we applied Bamgineer to a single high-coverage (mean: 220X) exome sequence file from a blood sample to simulate copy number profiles of 3 exemplar tumors from each of 10 tumor types at 5 tumor cellularity levels (20-100%, 150 BAM files in total). To demonstrate feasibility beyond exome data, we introduced read alignments to a targeted 5-gene cell-free DNA sequencing library to simulate EGFR amplifications at frequencies consistent with circulating tumor DNA (10, 1, 0.1 and 0.01%) while retaining the multimodal insert size distribution of the original data. We expect Bamgineer to be of use for development and systematic benchmarking of CNV calling algorithms by users using locally-generated data for a variety of applications. The source code is freely available at http://github.com/pughlab/bamgineer.Author summaryWe present Bamgineer, a software program to introduce user-defined, haplotype-specific copy number variants (CNVs) at any frequency into standard Binary Alignment Mapping (BAM) files. Copy number gains are simulated by introducing new DNA sequencing read pairs sampled from existing reads and modified to contain SNPs of the haplotype of interest. This approach retains biases of the original data such as local coverage, strand bias, and insert size. Deletions are simulated by removing reads corresponding to one or both haplotypes. In our proof-of-principle study, we simulated copy number profiles from 10 cancer types at varying cellularity levels typically encountered in clinical samples. We also demonstrated introduction of low frequency CNVs into cell-free DNA sequencing data that retained the bimodal fragment size distribution characteristic of these data. Bamgineer is flexible and enables users to simulate CNVs that reflect characteristics of locally-generated sequence files and can be used for many applications including development and benchmarking of CNV inference tools for a variety of data types.


2022 ◽  
Author(s):  
Eduardo A Maury ◽  
Maxwell A Sherman ◽  
Giulio Genovese ◽  
Thomas G. Gilgenast ◽  
Prashanth Rajarajan ◽  
...  

While inherited and de novo copy number variants (CNV) have been implicated in the genetic architecture of schizophrenia (SCZ), the contribution of somatic CNVs (sCNVs), present in some but not all cells of the body, remains unknown. Here we explore the role of sCNVs in SCZ by analyzing blood-derived genotype arrays from 12,834 SCZ cases and 11,648 controls. sCNVs were more common in cases (0.91%) than in controls (0.51%, p = 2.68e-4). We observed recurrent somatic deletions of exons 1-5 of the NRXN1 gene in 5 SCZ cases. Allele-specific Hi-C maps revealed ectopic, allele-specific loops forming between a potential novel cryptic promoter and non-coding cis regulatory elements upon deletions in the 5' region of NRXN1. We also observed recurrent intragenic deletions of ABCB11, a gene associated with anti-psychotic response, in 5 treatment-resistant SCZ cases. Taken together our results indicate an important role of sCNVs to SCZ risk and treatment-responsiveness.


2018 ◽  
Author(s):  
Whitney Whitford ◽  
Klaus Lehnert ◽  
Russell G. Snell ◽  
Jessie C. Jacobsen

AbstractBackgroundThe popularisation and decreased cost of genome resequencing has resulted in an increased use in molecular diagnostics. While there are a number of established and high quality bioinfomatic tools for identifying small genetic variants including single nucleotide variants and indels, currently there is no established standard for the detection of copy number variants (CNVs) from sequence data. The requirement for CNV detection from high throughput sequencing has resulted in the development of a large number of software packages. These tools typically utilise the sequence data characteristics: read depth, split reads, read pairs, and assembly-based techniques. However the additional source of information from read balance, defined as relative proportion of reads of each allele at each position, has been underutilised in the existing applications.ResultsWe present Read Balance Validator (RBV), a bioinformatic tool which uses read balance for prioritisation and validation of putative CNVs. The software simultaneously interrogates nominated regions for the presence of deletions or multiplications, and can differentiate larger CNVs from diploid regions. Additionally, the utility of RBV to test for inheritance of CNVs is demonstrated in this report.ConclusionsRBV is a CNV validation and prioritisation bioinformatic tool for both genome and exome sequencing available as a python package from https://github.com/whitneywhitford/RBV


Genes ◽  
2020 ◽  
Vol 11 (2) ◽  
pp. 141 ◽  
Author(s):  
Feichen Shen ◽  
Jeffrey M. Kidd

Gene duplication is a major mechanism for the evolution of gene novelty, and copy-number variation makes a major contribution to inter-individual genetic diversity. However, most approaches for studying copy-number variation rely upon uniquely mapping reads to a genome reference and are unable to distinguish among duplicated sequences. Specialized approaches to interrogate specific paralogs are comparatively slow and have a high degree of computational complexity, limiting their effective application to emerging population-scale data sets. We present QuicK-mer2, a self-contained, mapping-free approach that enables the rapid construction of paralog-specific copy-number maps from short-read sequence data. This approach is based on the tabulation of unique k-mer sequences from short-read data sets, and is able to analyze a 20X coverage human genome in approximately 20 min. We applied our approach to newly released sequence data from the 1000 Genomes Project, constructed paralog-specific copy-number maps from 2457 unrelated individuals, and uncovered copy-number variation of paralogous genes. We identify nine genes where none of the analyzed samples have a copy number of two, 92 genes where the majority of samples have a copy number other than two, and describe rare copy number variation effecting multiple genes at the APOBEC3 locus.


2012 ◽  
Vol 18 (10) ◽  
pp. 1090-1095 ◽  
Author(s):  
D Moreno-De-Luca ◽  
S J Sanders ◽  
A J Willsey ◽  
J G Mulle ◽  
J K Lowe ◽  
...  

2014 ◽  
Vol 204 (2) ◽  
pp. 108-114 ◽  
Author(s):  
Elliott Rees ◽  
James T. R. Walters ◽  
Lyudmila Georgieva ◽  
Anthony R. Isles ◽  
Kimberly D. Chambert ◽  
...  

BackgroundA number of copy number variants (CNVs) have been suggested as susceptibility factors for schizophrenia. For some of these the data remain equivocal, and the frequency in individuals with schizophrenia is uncertain.AimsTo determine the contribution of CNVs at 15 schizophrenia-associated loci (a) using a large new data-set of patients with schizophrenia (n= 6882) and controls (n= 6316), and (b) combining our results with those from previous studies.MethodWe used Illumina microarrays to analyse our data. Analyses were restricted to 520 766 probes common to all arrays used in the different data-sets.ResultsWe found higher rates in participants with schizophrenia than in controls for 13 of the 15 previously implicated CNVs. Six were nominally significantly associated (P<0.05) in this new data-set: deletions at 1q21.1,NRXN1, 15q11.2 and 22q11.2 and duplications at 16p11.2 and the Angelman/Prader–Willi Syndrome (AS/PWS) region. All eight AS/PWS duplications in patients were of maternal origin. When combined with published data, 11 of the 15 loci showed highly significant evidence for association with schizophrenia (P<4.1×10−4).ConclusionsWe strengthen the support for the majority of the previously implicated CNVs in schizophrenia. About 2.5% of patients with schizophrenia and 0.9% of controls carry a large, detectable CNV at one of these loci. Routine CNV screening may be clinically appropriate given the high rate of known deleterious mutations in the disorder and the comorbidity associated with these heritable mutations.


Science ◽  
2019 ◽  
Vol 366 (6463) ◽  
pp. eaax2083 ◽  
Author(s):  
PingHsun Hsieh ◽  
Mitchell R. Vollger ◽  
Vy Dang ◽  
David Porubsky ◽  
Carl Baker ◽  
...  

Copy number variants (CNVs) are subject to stronger selective pressure than single-nucleotide variants, but their roles in archaic introgression and adaptation have not been systematically investigated. We show that stratified CNVs are significantly associated with signatures of positive selection in Melanesians and provide evidence for adaptive introgression of large CNVs at chromosomes 16p11.2 and 8p21.3 from Denisovans and Neanderthals, respectively. Using long-read sequence data, we reconstruct the structure and complex evolutionary history of these polymorphisms and show that both encode positively selected genes absent from most human populations. Our results collectively suggest that large CNVs originating in archaic hominins and introgressed into modern humans have played an important role in local population adaptation and represent an insufficiently studied source of large-scale genetic variation.


2020 ◽  
Author(s):  
Wilson Nandolo ◽  
Gábor Mészáros ◽  
Maria Wurzinger ◽  
Liveness J. Banda ◽  
Timothy N. Gondwe ◽  
...  

Abstract Background Copy number variations (CNV) are a significant source of variation in the genome and are therefore essential to the understanding of genetic characterization. The aim of this study was to develop a fine-scaled copy number variation map for African goats. We used sequence data from multiple breeds and from multiple African countries. Results A total of 253,553 CNV (244,876 deletions and 8,677 duplications) were identified, corresponding to an overall average of 1,393 CNV per animal. The mean CNV length was 3.3 kb, with a median of 1.3 kb. There was substantial differentiation between the populations for some CNV, suggestive of the effect of population-specific selective pressures. A total of 6,231 global CNV regions (CNVR) were found across all animals, representing 59.2 Mb (2.4%) of the goat genome. About 1.6% of the CNVR were present in all 34 breeds and 28.7% were present in all 5 geographical areas across Africa, where animals had been sampled. The CNVR had genes that were highly enriched in important biological functions, molecular functions, and cellular components including retrograde endocannabinoid signaling, glutamatergic synapse and circadian entrainment. Conclusions This study presents the first fine CNV map of African goat based on WGS data and adds to the growing body of knowledge on the genetic characterization of goats.


2016 ◽  
Author(s):  
Gemma M Jenkins ◽  
Michael E Goddard ◽  
Michael A Black ◽  
Rudiger Brauning ◽  
Benoit Auvray ◽  
...  

Background.Copy number variants (CNVs) are a type of polymorphism found to underlie phenotypic variation, both in humans and livestock. Most surveys of CNV in livestock have been conducted in the cattle genome, and often utilise only a single approach for the detection of copy number differences. Here we performed a study of CNV in sheep, using multiple methods to identify and characterise copy number changes. Comprehensive information from small pedigrees (trios) was collected using multiple platforms (array CGH, SNP chip and whole genome sequence data), with these data then analysed via multiple approaches to identify and verify CNVs.Results.In total, 3,488 autosomal CNV regions (CNVRs) were identified in this study, which substantially builds on an initial survey of the sheep genome that identified 135 CNVRs. The average length of the identified CNVRs was 19kb (range of 1kb to 3.6Mb), with shorter CNVRs being more frequent than longer CNVRs. The total length of all CNVRs was 67.6Mbps, which equates to 2.7% of the sheep autosomes. For individuals this value ranged from 0.24 to 0.55%, and the majority of CNVRs were identified in single animals. Rather than being uniformly distributed throughout the genome, CNVRs tended to be clustered. Application of three independent approaches for CNVR detection facilitated a comparison of validation rates. CNVs identified on the Roche-NimbleGen 2.1M CGH array generally had low validation rates with lower density arrays, while whole genome sequence data had the highest validation rate (>60%).Conclusions.This study represents the first comprehensive survey of the distribution, prevalence and characteristics of CNVR in sheep. Multiple approaches were used to detect CNV regions and it appears that the best method for verifying CNVR on a large scale involves using a combination of detection methodologies. The characteristics of the 3,488 autosomal CNV regions identified in this study are comparable to other CNV regions reported in the literature and provide a valuable and sizeable addition to the small subset of published sheep CNVs.


Sign in / Sign up

Export Citation Format

Share Document