scholarly journals Genome-wide reconstruction of complex structural variants using read clouds

2016 ◽  
Author(s):  
Noah Spies ◽  
Ziming Weng ◽  
Alex Bishara ◽  
Jennifer McDaniel ◽  
David Catoe ◽  
...  

AbstractRecently developed methods that utilize partitioning of long genomic DNA fragments, and barcoding of shorter fragments derived from them, have succeeded in retaining long-range information in short sequencing reads. These so-called read cloud approaches represent a powerful, accurate, and cost-effective alternative to single-molecule long-read sequencing. We developed software, GROC-SVs, that takes advantage of read clouds for structural variant detection and assembly. We apply the method to two 10x Genomics data sets, one chromothriptic sarcoma with several spatially separated samples, and one breast cancer cell line, all Illumina-sequenced to high coverage. Comparison to short-fragment data from the same samples, and validation by mate-pair data from a subset of the sarcoma samples, demonstrate substantial improvement in specificity of breakpoint detection compared to short-fragment sequencing, at comparable sensitivity, and vice versa. The embedded longrange information also facilitates sequence assembly of a large fraction of the breakpoints; importantly, consecutive breakpoints that are closer than the average length of the input DNA molecules can be assembled together and their order and arrangement reconstructed, with some events exhibiting remarkable complexity. These features facilitated an analysis of the structural evolution of the sarcoma. In the chromothripsis, rearrangements occurred before copy number amplifications, and using the phylogenetic tree built from point mutation data we show that single nucleotide variants and structural variants are not correlated. We predict significant future advances in structural variant science using 10x data analyzed with GROC-SVs and other read cloud-specific methods.

2019 ◽  
Vol 35 (17) ◽  
pp. 2907-2915 ◽  
Author(s):  
David Heller ◽  
Martin Vingron

Abstract Motivation Structural variants are defined as genomic variants larger than 50 bp. They have been shown to affect more bases in any given genome than single-nucleotide polymorphisms or small insertions and deletions. Additionally, they have great impact on human phenotype and diversity and have been linked to numerous diseases. Due to their size and association with repeats, they are difficult to detect by shotgun sequencing, especially when based on short reads. Long read, single-molecule sequencing technologies like those offered by Pacific Biosciences or Oxford Nanopore Technologies produce reads with a length of several thousand base pairs. Despite the higher error rate and sequencing cost, long-read sequencing offers many advantages for the detection of structural variants. Yet, available software tools still do not fully exploit the possibilities. Results We present SVIM, a tool for the sensitive detection and precise characterization of structural variants from long-read data. SVIM consists of three components for the collection, clustering and combination of structural variant signatures from read alignments. It discriminates five different variant classes including similar types, such as tandem and interspersed duplications and novel element insertions. SVIM is unique in its capability of extracting both the genomic origin and destination of duplications. It compares favorably with existing tools in evaluations on simulated data and real datasets from Pacific Biosciences and Nanopore sequencing machines. Availability and implementation The source code and executables of SVIM are available on Github: github.com/eldariont/svim. SVIM has been implemented in Python 3 and published on bioconda and the Python Package Index. Supplementary information Supplementary data are available at Bioinformatics online.


2018 ◽  
Author(s):  
David Heller ◽  
Martin Vingron

AbstractMotivationStructural variants are defined as genomic variants larger than 50bp. They have been shown to affect more bases in any given genome than SNPs or small indels. Additionally, they have great impact on human phenotype and diversity and have been linked to numerous diseases. Due to their size and association with repeats, they are difficult to detect by shotgun sequencing, especially when based on short reads. Long read, single molecule sequencing technologies like those offered by Pacific Biosciences or Oxford Nanopore Technologies produce reads with a length of several thousand base pairs. Despite the higher error rate and sequencing cost, long read sequencing offers many advantages for the detection of structural variants. Yet, available software tools still do not fully exploit the possibilities.ResultsWe present SVIM, a tool for the sensitive detection and precise characterization of structural variants from long read data. SVIM consists of three components for the collection, clustering and combination of structural variant signatures from read alignments. It discriminates five different variant classes including similar types, such as tandem and interspersed duplications and novel element insertions. SVIM is unique in its capability of extracting both the genomic origin and destination of duplications. It compares favorably with existing tools in evaluations on simulated data and real datasets from PacBio and Nanopore sequencing machines.Availability and implementationThe source code and executables of SVIM are available on Github: github.com/eldariont/svim. SVIM has been implemented in Python 3 and published on bioconda and the Python Package [email protected]


2017 ◽  
Author(s):  
Michael D. Kaiser ◽  
Jennifer R. Davis ◽  
Boris S. Grinberg ◽  
John S. Oliver ◽  
Jay M. Sage ◽  
...  

The importance of structural variation in human disease and the difficulty of detecting structural variants larger than 50 base pairs has led to the development of several long-read sequencing technologies and optical mapping platforms. Frequently, multiple technologies and ad hoc methods are required to obtain a consensus regarding the location, size and nature of a structural variant, with no approach able to reliably bridge the gap of variant sizes between the domain of short-read approaches and the largest rearrangements observed with optical mapping.To address this unmet need, we have developed a new software package,SV-Verify™, which utilizes data collected with the Nabsys High Definition Mapping(HD-Mapping™) system, to perform hypothesis-based verification of putative deletions. We demonstrate that whole genome maps, constructed from electronic detection of tagged DNA, hundreds of kilobases in length, can be used effectively to facilitate calling of structural variants ranging in size from 300 base pairs to hundreds of kilobase pairs.SV-Verifyimplements hypothesis-based verification of putative structural variants using a set of support vector machines and is capable of concurrently testing several thousand independent hypotheses. We describe support vector machine training, utilizing a well-characterized human genome, and application of the resulting classifiers to another human genome, demonstrating high sensitivity and specificity for deletions ≥300 base pairs.


2019 ◽  
Author(s):  
Sergey Aganezov ◽  
Sara Goodwin ◽  
Rachel Sherman ◽  
Fritz J. Sedlazeck ◽  
Gayatri Arun ◽  
...  

Improved identification of structural variants (SVs) in cancer can lead to more targeted and effective treatment options as well as advance our basic understanding of disease progression. We performed whole genome sequencing of the SKBR3 breast cancer cell-line and patient-derived tumor and normal organoids from two breast cancer patients using 10X/Illumina, PacBio, and Oxford Nanopore sequencing. We then inferred SVs and large-scale allele-specific copy number variants (CNVs) using an ensemble of methods. Our findings demonstrate that long-read sequencing allows for substantially more accurate and sensitive SV detection, with between 90% and 95% of variants supported by each long-read technology also supported by the other. We also report high accuracy for long-reads even at relatively low coverage (25x-30x). Furthermore, we inferred karyotypes from these data using our enhanced RCK algorithm to present a more accurate representation of the mutated cancer genomes, and find hundreds of variants affecting known cancer-related genes detectable only through long-read sequencing. These findings highlight the need for long-read sequencing of cancer genomes for the precise analysis of their genetic instability.


2020 ◽  
Vol 500 (4) ◽  
pp. 4937-4957 ◽  
Author(s):  
G Martin ◽  
R A Jackson ◽  
S Kaviraj ◽  
H Choi ◽  
J E G Devriendt ◽  
...  

ABSTRACT Dwarf galaxies (M⋆ < 109 M⊙) are key drivers of mass assembly in high-mass galaxies, but relatively little is understood about the assembly of dwarf galaxies themselves. Using the NewHorizon cosmological simulation (∼40 pc spatial resolution), we investigate how mergers and fly-bys drive the mass assembly and structural evolution of around 1000 field and group dwarfs up to z = 0.5. We find that, while dwarf galaxies often exhibit disturbed morphologies (5 and 20 per cent are disturbed at z = 1 and z = 3 respectively), only a small proportion of the morphological disturbances seen in dwarf galaxies are driven by mergers at any redshift (for 109 M⊙, mergers drive under 20 per cent morphological disturbances). They are instead primarily the result of interactions that do not end in a merger (e.g. fly-bys). Given the large fraction of apparently morphologically disturbed dwarf galaxies which are not, in fact, merging, this finding is particularly important to future studies identifying dwarf mergers and post-mergers morphologically at intermediate and high redshifts. Dwarfs typically undergo one major and one minor merger between z = 5 and z = 0.5, accounting for 10 per cent of their total stellar mass. Mergers can also drive moderate star formation enhancements at lower redshifts (3 or 4 times at z = 1), but this accounts for only a few per cent of stellar mass in the dwarf regime given their infrequency. Non-merger interactions drive significantly smaller star formation enhancements (around two times), but their preponderance relative to mergers means they account for around 10 per cent of stellar mass formed in the dwarf regime.


Plants ◽  
2021 ◽  
Vol 10 (4) ◽  
pp. 655
Author(s):  
Hongmei Du ◽  
Shah Zaman ◽  
Shuiqingqing Hu ◽  
Shengquan Che

This study aimed to obtain the full-length transcriptome of purslane (Portulaca oleracea); assorted plant samples were used for single-molecule real-time (SMRT) sequencing. Based on SMRT, functional annotation of transcripts, transcript factors (TFs) analysis, simple sequence repeat analysis and long non-coding RNAs (LncRNAs) prediction were accomplished. Total 15.33-GB reads were produced; with 9,350,222 subreads and the average length of subreads, 1640 bp was counted. With 99.99% accuracy, after clustering, 132,536 transcripts and 78,559 genes were detected. All unique SMART transcripts were annotated in seven functional databases. 4180 TFs (including transcript regulators) and 7289 LncRNAs were predicted. The results of RNA-seq were confirmed with qRT–PCR analysis. Illumina sequencing of leaves and roots of two purslane genotypes was carried out. Amounts of differential expression genes and related KEGG pathways were found. The expression profiles of related genes in the biosynthesis of unsaturated fatty acids pathway in leaves and roots of two genotypes of purslane were analyzed. Differential expression of genes in this pathway built the foundation of ω-3 fatty acid accumulation in different organs and genotypes of purslane. The aforementioned results provide sequence information and may be a valuable resource for whole-genome sequencing of purslane in the future.


Author(s):  
Jacqueline Neubauer ◽  
Shouyu Wang ◽  
Giancarlo Russo ◽  
Cordula Haas

AbstractSudden unexplained death (SUD) takes up a considerable part in overall sudden death cases, especially in adolescents and young adults. During the past decade, many channelopathy- and cardiomyopathy-associated single nucleotide variants (SNVs) have been identified in SUD studies by means of postmortem molecular autopsy, yet the number of cases that remain inconclusive is still high. Recent studies had suggested that structural variants (SVs) might play an important role in SUD, but there is no consensus on the impact of SVs on inherited cardiac diseases. In this study, we searched for potentially pathogenic SVs in 244 genes associated with cardiac diseases. Whole-exome sequencing and appropriate data analysis were performed in 45 SUD cases. Re-analysis of the exome data according to the current ACMG guidelines identified 14 pathogenic or likely pathogenic variants in 10 (22.2%) out of the 45 SUD cases, whereof 2 (4.4%) individuals had variants with likely functional effects in the channelopathy-associated genes SCN5A and TRDN and 1 (2.2%) individual in the cardiomyopathy-associated gene DTNA. In addition, 18 structural variants (SVs) were identified in 15 out of the 45 individuals. Two SVs with likely functional impairment were found in the coding regions of PDSS2 and TRPM4 in 2 SUD cases (4.4%). Both were identified as heterozygous deletions, which were confirmed by multiplex ligation-dependent probe amplification. In conclusion, our findings support that SVs could contribute to the pathology of the sudden death event in some of the cases and therefore should be investigated on a routine basis in suspected SUD cases.


2019 ◽  
Author(s):  
Jie Xu ◽  
Fan Song ◽  
Emily Schleicher ◽  
Christopher Pool ◽  
Darrin Bann ◽  
...  

AbstractWhile genomic analysis of tumors has stimulated major advances in cancer diagnosis, prognosis and treatment, current methods fail to identify a large fraction of somatic structural variants in tumors. We have applied a combination of whole genome sequencing and optical genome mapping to a number of adult and pediatric leukemia samples, which revealed in each of these samples a large number of structural variants not recognizable by current tools of genomic analyses. We developed computational methods to determine which of those variants likely arose as somatic mutations. The method identified 97% of the structural variants previously reported by karyotype analysis of these samples and revealed an additional fivefold more such somatic rearrangements. The method identified on average tens of previously unrecognizable inversions and duplications and hundreds of previously unrecognizable insertions and deletions. These structural variants recurrently affected a number of leukemia associated genes as well as cancer driver genes not previously associated with leukemia and genes not previously associated with cancer. A number of variants only affected intergenic regions but caused cis-acting alterations in expression of neighboring genes. Analysis of TCGA data indicates that the status of several of the recurrently mutated genes identified in this study significantly affect survival of AML patients. Our results suggest that current genomic analysis methods fail to identify a majority of structural variants in leukemia samples and this lacunae may hamper diagnostic and prognostic efforts.


2022 ◽  
Author(s):  
Claire M&eacuterot ◽  
Kristina S R Stenl&oslashkk ◽  
Clare Venney ◽  
Martin Laporte ◽  
Michel Moser ◽  
...  

The parallel evolution of nascent pairs of ecologically differentiated species offers an opportunity to get a better glimpse at the genetic architecture of speciation. Of particular interest is our recent ability to consider a wider range of genomic variants, not only single-nucleotide polymorphisms (SNPs), thanks to long-read sequencing technology. We can now identify structural variants (SVs) like insertions, deletions, and other structural rearrangements, allowing further insights into the genetic architecture of speciation and how different variants are involved in species differentiation. Here, we investigated genomic patterns of differentiation between sympatric species pairs (Dwarf and Normal) belonging to the Lake Whitefish (Coregonus clupeaformis) species complex. We assembled the first reference genomes for both Dwarf and Normal Lake Whitefish, annotated the transposable elements, and analysed the genome in the light of related coregonid species. Next, we used a combination of long-read and short-read sequencing to characterize SVs and genotype them at population-scale using genome-graph approaches, showing that SVs cover five times more of the genome than SNPs. We then integrated both SNPs and SVs to investigate the genetic architecture of species differentiation in two different lakes and highlighted an excess of shared outliers of differentiation. In particular, a large fraction of SVs differentiating the two species was driven by transposable elements (TEs), suggesting that TE accumulation during a period of allopatry predating secondary contact may have been a key process in the speciation of the Dwarf and Normal Whitefish. Altogether, our results suggest that SVs play an important role in speciation and that by combining second and third generation sequencing we now have the ability to integrate SVs into speciation genomics.


2017 ◽  
Author(s):  
Mircea Cretu Stancu ◽  
Markus J. van Roosmalen ◽  
Ivo Renkens ◽  
Marleen Nieboer ◽  
Sjors Middelkamp ◽  
...  

AbstractStructural genomic variants form a common type of genetic alteration underlying human genetic disease and phenotypic variation. Despite major improvements in genome sequencing technology and data analysis, the detection of structural variants still poses challenges, particularly when variants are of high complexity. Emerging long-read single-molecule sequencing technologies provide new opportunities for detection of structural variants. Here, we demonstrate sequencing of the genomes of two patients with congenital abnormalities using the ONT MinION at 11x and 16x mean coverage, respectively. We developed a bioinformatic pipeline - NanoSV - to efficiently map genomic structural variants (SVs) from the long-read data. We demonstrate that the nanopore data are superior to corresponding short-read data with regard to detection of de novo rearrangements originating from complex chromothripsis events in the patients. Additionally, genome-wide surveillance of SVs, revealed 3,253 (33%) novel variants that were missed in short-read data of the same sample, the majority of which are duplications < 200bp in size. Long sequencing reads enabled efficient phasing of genetic variations, allowing the construction of genome-wide maps of phased SVs and SNVs. We employed read-based phasing to show that all de novo chromothripsis breakpoints occurred on paternal chromosomes and we resolved the long-range structure of the chromothripsis. This work demonstrates the value of long-read sequencing for screening whole genomes of patients for complex structural variants.


Sign in / Sign up

Export Citation Format

Share Document