scholarly journals Automated Structural Variant Verification in Human Genomes using Single-Molecule Electronic DNA Mapping

2017 ◽  
Author(s):  
Michael D. Kaiser ◽  
Jennifer R. Davis ◽  
Boris S. Grinberg ◽  
John S. Oliver ◽  
Jay M. Sage ◽  
...  

The importance of structural variation in human disease and the difficulty of detecting structural variants larger than 50 base pairs has led to the development of several long-read sequencing technologies and optical mapping platforms. Frequently, multiple technologies and ad hoc methods are required to obtain a consensus regarding the location, size and nature of a structural variant, with no approach able to reliably bridge the gap of variant sizes between the domain of short-read approaches and the largest rearrangements observed with optical mapping.To address this unmet need, we have developed a new software package,SV-Verify™, which utilizes data collected with the Nabsys High Definition Mapping(HD-Mapping™) system, to perform hypothesis-based verification of putative deletions. We demonstrate that whole genome maps, constructed from electronic detection of tagged DNA, hundreds of kilobases in length, can be used effectively to facilitate calling of structural variants ranging in size from 300 base pairs to hundreds of kilobase pairs.SV-Verifyimplements hypothesis-based verification of putative structural variants using a set of support vector machines and is capable of concurrently testing several thousand independent hypotheses. We describe support vector machine training, utilizing a well-characterized human genome, and application of the resulting classifiers to another human genome, demonstrating high sensitivity and specificity for deletions ≥300 base pairs.

2017 ◽  
Author(s):  
John S. Oliver ◽  
Anthony Catalano ◽  
Jennifer R. Davis ◽  
Boris S. Grinberg ◽  
Timothy E. Hutchins ◽  
...  

With the advent of routine short-read genome sequencing has come a growing recognition of the importance of long-range, structural information in applications ranging from sequence assembly to the detection of structural variation. Here we describe the Nabsys solid-state detector capable of detecting tags on single molecules of DNA 100s of kilobases in length as they translocate through the detector at a velocity greater than 1 megabase pair per second. Sequence-specific tags are detected with a high signal-to-noise ratio. The physical distance between tags is determined after accounting for viscous drag-induced intramolecular velocity fluctuations. The accurate measurement of the physical distance between tags on each molecule and the ability of the detector to resolve distances between tags of less than 300 base-pairs enables the construction of high-density genome maps.


2019 ◽  
Vol 35 (17) ◽  
pp. 2907-2915 ◽  
Author(s):  
David Heller ◽  
Martin Vingron

Abstract Motivation Structural variants are defined as genomic variants larger than 50 bp. They have been shown to affect more bases in any given genome than single-nucleotide polymorphisms or small insertions and deletions. Additionally, they have great impact on human phenotype and diversity and have been linked to numerous diseases. Due to their size and association with repeats, they are difficult to detect by shotgun sequencing, especially when based on short reads. Long read, single-molecule sequencing technologies like those offered by Pacific Biosciences or Oxford Nanopore Technologies produce reads with a length of several thousand base pairs. Despite the higher error rate and sequencing cost, long-read sequencing offers many advantages for the detection of structural variants. Yet, available software tools still do not fully exploit the possibilities. Results We present SVIM, a tool for the sensitive detection and precise characterization of structural variants from long-read data. SVIM consists of three components for the collection, clustering and combination of structural variant signatures from read alignments. It discriminates five different variant classes including similar types, such as tandem and interspersed duplications and novel element insertions. SVIM is unique in its capability of extracting both the genomic origin and destination of duplications. It compares favorably with existing tools in evaluations on simulated data and real datasets from Pacific Biosciences and Nanopore sequencing machines. Availability and implementation The source code and executables of SVIM are available on Github: github.com/eldariont/svim. SVIM has been implemented in Python 3 and published on bioconda and the Python Package Index. Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
Daniel L. Cameron ◽  
Jonathan Baber ◽  
Charles Shale ◽  
Jose Espejo Valle-Inclan ◽  
Nicolle Besselink ◽  
...  

AbstractHere we present GRIDSS2, a general purpose structural variant caller optimised for tumour/normal somatic calling. Using cell line, patient sample validation and cohort-level comparisons, we show GRIDSS2 outperforms recent state-of-the-art tools. We demonstrate GRIDSS2 retains high sensitivity and precision even for small events by identifying a small (32-100bp) duplication signature strongly associated with colorectal cancer using 3,782 metastatic cancers that have been deeply sequenced by the Hartwig Medical Foundation. Essential to the high precision achieved by GRIDSS2 is the novel reporting of single breakend variants: structural variants in which only one side can be unambiguously determined. We show that the inclusion of single breakends reduces the false negative rate from 10.4% to 3.4%. Demonstrating the power single breakend calling has in genomic regions traditionally considered inaccessible to short read callers, we find that 47% of somatic centromeric breaks are repaired to non-centromeric sequence, with chromosome 1 exhibiting a unique centromeric rearrangement signature. Finally, we show that somatic structural variants are highly clustered with GRIDSS2 able to phase 16% of somatic structural variants in the Hartwig cohort from short read sequencing alone.


2018 ◽  
Author(s):  
David Heller ◽  
Martin Vingron

AbstractMotivationStructural variants are defined as genomic variants larger than 50bp. They have been shown to affect more bases in any given genome than SNPs or small indels. Additionally, they have great impact on human phenotype and diversity and have been linked to numerous diseases. Due to their size and association with repeats, they are difficult to detect by shotgun sequencing, especially when based on short reads. Long read, single molecule sequencing technologies like those offered by Pacific Biosciences or Oxford Nanopore Technologies produce reads with a length of several thousand base pairs. Despite the higher error rate and sequencing cost, long read sequencing offers many advantages for the detection of structural variants. Yet, available software tools still do not fully exploit the possibilities.ResultsWe present SVIM, a tool for the sensitive detection and precise characterization of structural variants from long read data. SVIM consists of three components for the collection, clustering and combination of structural variant signatures from read alignments. It discriminates five different variant classes including similar types, such as tandem and interspersed duplications and novel element insertions. SVIM is unique in its capability of extracting both the genomic origin and destination of duplications. It compares favorably with existing tools in evaluations on simulated data and real datasets from PacBio and Nanopore sequencing machines.Availability and implementationThe source code and executables of SVIM are available on Github: github.com/eldariont/svim. SVIM has been implemented in Python 3 and published on bioconda and the Python Package [email protected]


2016 ◽  
Author(s):  
Xian Fan ◽  
Mark Chaisson ◽  
Luay Nakhleh ◽  
Ken Chen

AbstractAchieving complete, accurate and cost-effective assembly of human genome is of great importance for realizing the promises of precision medicine. The abundance of repeats and genetic variations in human genome and the limitations of existing sequencing technologies call for the development of novel assembly methods that could leverage the complementary strengths of multiple technologies.We propose a Hybrid Structural variant Assembly (HySA) approach that integrates sequencing reads from next generation sequencing (NGS) and single-molecule sequencing (SMS) technologies to accurately assemble and detect structural variations (SV) in human genome. By identifying homologous SV-containing reads from different technologies through a bipartite-graph-based clustering algorithm, our approach turns a whole genome assembly problem into a set of independent SV assembly problems, each of which can be effectively solved to enhance assembly of structurally altered regions in human genome.In testing our approach using data generated from a haploid hydatidiform mole genome (CHM1) and a diploid human genome (NA12878), we found that our approach substantially improved the detection of many types of SVs, particularly novel large insertions, small INDELs (10-50bp) and short tandem repeat expansions and contractions over existing approaches with a low false discovery rate. Our work highlights the strengths and limitations of current approaches and provides an effective solution for extending the power of existing sequencing technologies for SV discovery.


Blood ◽  
2012 ◽  
Vol 120 (21) ◽  
pp. 2444-2444
Author(s):  
Aditya Gupta ◽  
Jaehyup Kim ◽  
Chelsea Hope ◽  
Jeff Jensen ◽  
Natalie Callander ◽  
...  

Abstract Abstract 2444 Multiple myeloma genomes are characterized by complex structural and numerical abnormalities. Proteasome inhibitors are routinely used to treat multiple myeloma. Despite significant clinical success with these agents, development of resistance often limits therapeutic benefit. However, many questions remain unanswered regarding the molecular mechanisms underlying acquired resistance to proteasome inhibitors. In order to understand the dynamics of structural evolution of the multiple myeloma genome under selective pressure afforded by proteasome inhibition and to identify targets to overcome acquired resistance, we derived global optical maps of two myeloma cancer genomes (DNA extracted from CD138+ tumor cells), obtained sequentially from the same patient before and after development of resistance to bortezomib, the prototypical therapeutic proteasome inhibitor. Optical Mapping offers a high throughput, single molecule, whole genome analysis that offers the highest rate of discovery of structural and numerical variants, free of the confounders associated with hybridization-based approaches. Briefly, the Optical Mapping System assembles entire genomes from large datasets of Rmaps (Rmap = a restriction-mapped individual genomic DNA molecule- see Figure 1) from which novel balanced and complex structural variants (2 kb – entire chromosomes) are discovered and tabulated by our pipeline (Figure 2). We identified multiple structural variants including single nucleotide variations (SNVs), deletions, insertions, inversions, and loss of heterozygosity regions across the entire genome. Some of these variants are common to both bortezomib-sensitive and bortezomib-resistant genomes. We also discovered variants that were unique to the bortezomib resistant genome, implicating a role in acquisition of drug resistance. Many of these structural variants encompass genes, some of which have not been previously associated with multiple myeloma and bortezomib resistance, thus providing a rationale for further interrogation of these novel targets. In addition to novel potential targets, known recurrent events including del(13q) and a deletion spanning the CDKN2C/FAF1 locus on chromosome 1 were detected. Future efforts are directed towards integration and correlation of optical maps with whole genome sequencing and transcriptional profiling as well as establishing the frequency of prioritized genomic perturbations in bortezomib-sensitive and –resistant patient populations. The integration of structural optical maps with base-pair sequence information and transcriptomic tracks will generate an entirely new view of the multiple myeloma cancer genome at a previously unseen resolution. Fig. 1. Overview of the Optical Mapping platform. Bulk microscope cover glass is cleaned with a strong acid, then treated with a silane mixture to make positively charged Optical Mapping surfaces (i). A silicon wafer is patterned with standard photolithography techniques, and then replicated into a flexible PDMS microfluidic device (ii) using soft lithography. Finally, pure, high molecular-weight DNA (iii) is isolated from cultured eukaryotic cells using a gentle detergent-based lysis protocol. The microfluidic device is adhered to the Optical Mapping surface, and the DNA solution is pumped through the microchannels, wherein the DNA is elongated and attached to the Optical Mapping surface via electrostatic interaction (iv). The DNA is incubated with a restriction endonuclease (v), which cleaves the DNA at its cognate sites. The cleaved DNA is stained and imaged on an epifluorescence microscope (vi) illuminated by an argon-ion laser (vii) and controlled by a computer workstation (viii). Fig. 1. Overview of the Optical Mapping platform. Bulk microscope cover glass is cleaned with a strong acid, then treated with a silane mixture to make positively charged Optical Mapping surfaces (i). A silicon wafer is patterned with standard photolithography techniques, and then replicated into a flexible PDMS microfluidic device (ii) using soft lithography. Finally, pure, high molecular-weight DNA (iii) is isolated from cultured eukaryotic cells using a gentle detergent-based lysis protocol. The microfluidic device is adhered to the Optical Mapping surface, and the DNA solution is pumped through the microchannels, wherein the DNA is elongated and attached to the Optical Mapping surface via electrostatic interaction (iv). The DNA is incubated with a restriction endonuclease (v), which cleaves the DNA at its cognate sites. The cleaved DNA is stained and imaged on an epifluorescence microscope (vi) illuminated by an argon-ion laser (vii) and controlled by a computer workstation (viii). Fig. 2. Overview of the map assembly pipeline. Reference maps are generated in silico from the NCBI Build 35 human genome reference sequence, and used to seed an iterative process of pairwise alignment (which clusters together similar single-molecule maps) and local assembly (which generates a consensus optical map from a cluster of single-molecule maps). After several iterations of alignment and assembly, the consensus maps are aligned back to the reference map and analyzed for places where the consensus map differs significantly from the reference. Fig. 2. Overview of the map assembly pipeline. Reference maps are generated in silico from the NCBI Build 35 human genome reference sequence, and used to seed an iterative process of pairwise alignment (which clusters together similar single-molecule maps) and local assembly (which generates a consensus optical map from a cluster of single-molecule maps). After several iterations of alignment and assembly, the consensus maps are aligned back to the reference map and analyzed for places where the consensus map differs significantly from the reference. Disclosures: No relevant conflicts of interest to declare.


2018 ◽  
Author(s):  
De Coster Wouter ◽  
De Roeck Arne ◽  
De Pooter Tim ◽  
D’Hert Svenn ◽  
De Rijk Peter ◽  
...  

AbstractWe sequenced the Yoruban NA19240 genome on the long read sequencing platform Oxford Nanopore PromethION for benchmarking and evaluation of recently published aligners and structural variant calling tools. In this work, we determined the precision and recall, present high confidence and high sensitivity call sets of variants and discuss optimal parameters. The aligner Minimap2 and structural variant caller Sniffles are both the most accurate and the most computationally efficient tools in our study. We describe our scalable workflow for identification, annotation, and characterization of tens of thousands of structural variants from long read genome sequencing of an individual or population. By discussing the results of this genome we provide an approximation of what can be expected in future long read sequencing studies aiming for structural variant identification.


2016 ◽  
Author(s):  
Noah Spies ◽  
Ziming Weng ◽  
Alex Bishara ◽  
Jennifer McDaniel ◽  
David Catoe ◽  
...  

AbstractRecently developed methods that utilize partitioning of long genomic DNA fragments, and barcoding of shorter fragments derived from them, have succeeded in retaining long-range information in short sequencing reads. These so-called read cloud approaches represent a powerful, accurate, and cost-effective alternative to single-molecule long-read sequencing. We developed software, GROC-SVs, that takes advantage of read clouds for structural variant detection and assembly. We apply the method to two 10x Genomics data sets, one chromothriptic sarcoma with several spatially separated samples, and one breast cancer cell line, all Illumina-sequenced to high coverage. Comparison to short-fragment data from the same samples, and validation by mate-pair data from a subset of the sarcoma samples, demonstrate substantial improvement in specificity of breakpoint detection compared to short-fragment sequencing, at comparable sensitivity, and vice versa. The embedded longrange information also facilitates sequence assembly of a large fraction of the breakpoints; importantly, consecutive breakpoints that are closer than the average length of the input DNA molecules can be assembled together and their order and arrangement reconstructed, with some events exhibiting remarkable complexity. These features facilitated an analysis of the structural evolution of the sarcoma. In the chromothripsis, rearrangements occurred before copy number amplifications, and using the phylogenetic tree built from point mutation data we show that single nucleotide variants and structural variants are not correlated. We predict significant future advances in structural variant science using 10x data analyzed with GROC-SVs and other read cloud-specific methods.


2019 ◽  
Author(s):  
Jiajun Wang ◽  
Meng-Yin Li ◽  
Jie Yang ◽  
Ya-Qian Wang ◽  
Xue-Yuan Wu ◽  
...  

DNA lesion such as metholcytosine(<sup>m</sup>C), 8-OXO-guanine(<sup>O</sup>G), inosine(I) <i>etc</i> could cause the genetic diseases. Identification of the varieties of lesion bases are usually beyond the capability of conventional DNA sequencing which is mainly designed to discriminate four bases only. Therefore, lesion detection remain challenge due to the massive varieties and less distinguishable readouts for minor structural variations. Moreover, standard amplification and labelling hardly works in DNA lesions detection. Herein, we designed a single molecule interface from the mutant K238Q Aerolysin, whose confined sensing region shows the high compatible to capture and then directly convert each base lesion into distinguishable current readouts. Compared with previous single molecule sensing interface, the resolution of the K238Q Aerolysin nanopore is enhanced by 2-order. The novel K238Q could direct discriminate at least 3 types (<sup>m</sup>C, <sup>O</sup>G, I) lesions without lableing and quantify modification sites under mixed hetero-composition condition of oligonucleotide. Such nanopore could be further applied to diagnose genetic diseases at high sensitivity.


Sign in / Sign up

Export Citation Format

Share Document