scholarly journals HySA: a Hybrid Structural variant Assembly approach using next-generation and single-molecule sequencing technologies

2017 ◽  
Vol 27 (5) ◽  
pp. 793-800 ◽  
Author(s):  
Xian Fan ◽  
Mark Chaisson ◽  
Luay Nakhleh ◽  
Ken Chen
2016 ◽  
Author(s):  
Xian Fan ◽  
Mark Chaisson ◽  
Luay Nakhleh ◽  
Ken Chen

AbstractAchieving complete, accurate and cost-effective assembly of human genome is of great importance for realizing the promises of precision medicine. The abundance of repeats and genetic variations in human genome and the limitations of existing sequencing technologies call for the development of novel assembly methods that could leverage the complementary strengths of multiple technologies.We propose a Hybrid Structural variant Assembly (HySA) approach that integrates sequencing reads from next generation sequencing (NGS) and single-molecule sequencing (SMS) technologies to accurately assemble and detect structural variations (SV) in human genome. By identifying homologous SV-containing reads from different technologies through a bipartite-graph-based clustering algorithm, our approach turns a whole genome assembly problem into a set of independent SV assembly problems, each of which can be effectively solved to enhance assembly of structurally altered regions in human genome.In testing our approach using data generated from a haploid hydatidiform mole genome (CHM1) and a diploid human genome (NA12878), we found that our approach substantially improved the detection of many types of SVs, particularly novel large insertions, small INDELs (10-50bp) and short tandem repeat expansions and contractions over existing approaches with a low false discovery rate. Our work highlights the strengths and limitations of current approaches and provides an effective solution for extending the power of existing sequencing technologies for SV discovery.


2019 ◽  
Vol 21 (6) ◽  
pp. 1971-1986 ◽  
Author(s):  
Matteo Chiara ◽  
Federico Zambelli ◽  
Ernesto Picardi ◽  
David S Horner ◽  
Graziano Pesole

Abstract A number of studies have reported the successful application of single-molecule sequencing technologies to the determination of the size and sequence of pathological expanded microsatellite repeats over the last 5 years. However, different custom bioinformatics pipelines were employed in each study, preventing meaningful comparisons and somewhat limiting the reproducibility of the results. In this review, we provide a brief summary of state-of-the-art methods for the characterization of expanded repeats alleles, along with a detailed comparison of bioinformatics tools for the determination of repeat length and sequence, using both real and simulated data. Our reanalysis of publicly available human genome sequencing data suggests a modest, but statistically significant, increase of the error rate of single-molecule sequencing technologies at genomic regions containing short tandem repeats. However, we observe that all the methods herein tested, irrespective of the strategy used for the analysis of the data (either based on the alignment or assembly of the reads), show high levels of sensitivity in both the detection of expanded tandem repeats and the estimation of the expansion size, suggesting that approaches based on single-molecule sequencing technologies are highly effective for the detection and quantification of tandem repeat expansions and contractions.


2018 ◽  
Vol 1 (4) ◽  
pp. e00086
Author(s):  
S.P. Radko ◽  
L.K. Kurbatov ◽  
K.G. Ptitsyn ◽  
Y.Y. Kiseleva ◽  
E.A. Ponomarenko ◽  
...  

Transcriptome profiling is widely employed to analyze transcriptome dynamics when studying various biological processes at the cell and tissue levels. Unlike the second generation sequencers, which sequence relatively short fragments of nucleic acids, the third generation DNA/RNA sequencers developed by biotechnology companies “PacBio” and “Oxford Nanopore Technologies” allow one to sequence transcripts as single molecules and may be considered as potential molecular counters capable to measure the number of copies of each transcript with high throughput, sensitivity, and specificity. In the present review, the features of single molecule sequencing technologies offered by “PacBio” and “Oxford Nanopore Technologies” are considered alongside with their utility for transcriptome analysis, including the analysis of transcript isoforms. The prospects and limitations of the single molecule sequencing technology in application to quantitative transcriptome profiling are also discussed.


2019 ◽  
Author(s):  
Kuo-Ping Chiu ◽  
Alice L. Yu

Background. It is an important issue whether and how microorganisms can live harmoniously withnormal cells in the circulatory system. Answers to these issues will have enormous impact on medical microbiology. To address these issues, it is essential to identify and characterize the blood-borne microbes in an efficient and comprehensive manner. Methodology. Traditional approaches using PCR or microarray are not suitable for the purpose due to the complexity and composition of large amount of unknown microbial species in the circulatory system. Recent reports indicated that cell-free DNA (cfDNA) sequencing using advanced sequencing technologies, including next-generation sequencing (NGS) and single-molecule sequencing (SMS) together with associated bioinformatics approaches, possess a strong potential enabling us to address these issues at the molecular level. Results. Multiple studies using microbial cfDNA sequencing to identify microbes for septic patients have shown strong agreement with cell culture. Similar approaches have also been applied to reveal previously unidentified microorganisms or to demonstrate the feasibility of comprehensive assessment of bloodborne microorganisms for healthy and/or diseased individuals. Single-molecule sequencing (SMS) using either SMRT (single-molecule real-time) sequencing or Nanopore sequencing are providing new momentum to reinforce this line of investigations. Conclusions. Microbial cfDNA sequencing provides a novel opportunity allowing us to further understand the involvement of blood-borne microbes in development of diseases. Similar approaches should also be applicable to the study of metagenomics for sufficient and comprehensive analysis of microbial species isolated from various environments. This article reviews this line of research and discuss the methodological approaches that have been developed, or are likely to be developed in the future, which may have strong potential to facilitate cfDNA- and cfRNA-based studies of cancer and chronic diseases, in the hope that a better understanding of the hidden microbes in the circulatory system would improve the accuracy of diagnosis, prevention, and treatment of problematic diseases.


2016 ◽  
Author(s):  
Alexander Artyomenko ◽  
Nicholas C Wu ◽  
Serghei Mangul ◽  
Eleazar Eskin ◽  
Ren Sun ◽  
...  

AbstractAs a result of a high rate of mutations and recombination events, an RNA-virus exists as a heterogeneous “swarm” of mutant variants. The long read length offered by single-molecule sequencing technologies allows each mutant variant to be sequenced in a single pass. However, high error rate limits the ability to reconstruct heterogeneous viral population composed of rare, related mutant variants. In this paper, we present 2SNV, a method able to tolerate the high error-rate of the single-molecule protocol and reconstruct mutant variants. 2SNV uses linkage between single nucleotide variations to efficiently distinguish them from read errors. To benchmark the sensitivity of 2SNV, we performed a single-molecule sequencing experiment on a sample containing a titrated level of known viral mutant variants. Our method is able to accurately reconstruct clone with frequency of 0.2% and distinguish clones that differed in only two nucleotides distantly located on the genome. 2SNV outperforms existing methods for full-length viral mutant reconstruction. The open source implementation of 2SNV is freely available for download at http://alan.cs.gsu.edu/NGS/?q=content/2snv


2019 ◽  
Vol 35 (17) ◽  
pp. 2907-2915 ◽  
Author(s):  
David Heller ◽  
Martin Vingron

Abstract Motivation Structural variants are defined as genomic variants larger than 50 bp. They have been shown to affect more bases in any given genome than single-nucleotide polymorphisms or small insertions and deletions. Additionally, they have great impact on human phenotype and diversity and have been linked to numerous diseases. Due to their size and association with repeats, they are difficult to detect by shotgun sequencing, especially when based on short reads. Long read, single-molecule sequencing technologies like those offered by Pacific Biosciences or Oxford Nanopore Technologies produce reads with a length of several thousand base pairs. Despite the higher error rate and sequencing cost, long-read sequencing offers many advantages for the detection of structural variants. Yet, available software tools still do not fully exploit the possibilities. Results We present SVIM, a tool for the sensitive detection and precise characterization of structural variants from long-read data. SVIM consists of three components for the collection, clustering and combination of structural variant signatures from read alignments. It discriminates five different variant classes including similar types, such as tandem and interspersed duplications and novel element insertions. SVIM is unique in its capability of extracting both the genomic origin and destination of duplications. It compares favorably with existing tools in evaluations on simulated data and real datasets from Pacific Biosciences and Nanopore sequencing machines. Availability and implementation The source code and executables of SVIM are available on Github: github.com/eldariont/svim. SVIM has been implemented in Python 3 and published on bioconda and the Python Package Index. Supplementary information Supplementary data are available at Bioinformatics online.


Sign in / Sign up

Export Citation Format

Share Document