HySA: a Hybrid Structural variant Assembly approach using next-generation and single-molecule sequencing technologies

Xian Fan; Mark Chaisson; Luay Nakhleh; Ken Chen

doi:10.1101/gr.214767.116

HySA: A Hybrid Structural variant Assembly approach using next generation and single-molecule sequencing technologies

10.1101/069815 ◽

2016 ◽

Cited By ~ 2

Author(s):

Xian Fan ◽

Mark Chaisson ◽

Luay Nakhleh ◽

Ken Chen

Keyword(s):

Human Genome ◽

Single Molecule ◽

Clustering Algorithm ◽

Hydatidiform Mole ◽

Cost Effective ◽

Next Generation ◽

Structural Variations ◽

Single Molecule Sequencing ◽

Structural Variant ◽

Sequencing Technologies

AbstractAchieving complete, accurate and cost-effective assembly of human genome is of great importance for realizing the promises of precision medicine. The abundance of repeats and genetic variations in human genome and the limitations of existing sequencing technologies call for the development of novel assembly methods that could leverage the complementary strengths of multiple technologies.We propose a Hybrid Structural variant Assembly (HySA) approach that integrates sequencing reads from next generation sequencing (NGS) and single-molecule sequencing (SMS) technologies to accurately assemble and detect structural variations (SV) in human genome. By identifying homologous SV-containing reads from different technologies through a bipartite-graph-based clustering algorithm, our approach turns a whole genome assembly problem into a set of independent SV assembly problems, each of which can be effectively solved to enhance assembly of structurally altered regions in human genome.In testing our approach using data generated from a haploid hydatidiform mole genome (CHM1) and a diploid human genome (NA12878), we found that our approach substantially improved the detection of many types of SVs, particularly novel large insertions, small INDELs (10-50bp) and short tandem repeat expansions and contractions over existing approaches with a low false discovery rate. Our work highlights the strengths and limitations of current approaches and provides an effective solution for extending the power of existing sequencing technologies for SV discovery.

Download Full-text

The Evolution of High-Throughput Sequencing Technologies: From Sanger to Single-Molecule Sequencing

Next Generation Sequencing in Cancer Research ◽

10.1007/978-1-4614-7645-0_1 ◽

2013 ◽

pp. 1-30

Author(s):

Chee-Seng Ku ◽

Yudi Pawitan ◽

Mengchu Wu ◽

Dimitrios H. Roukos ◽

David N. Cooper

Keyword(s):

High Throughput ◽

Single Molecule ◽

High Throughput Sequencing ◽

Single Molecule Sequencing ◽

Sequencing Technologies

Download Full-text

Critical assessment of bioinformatics methods for the characterization of pathological repeat expansions with single-molecule sequencing data

Briefings in Bioinformatics ◽

10.1093/bib/bbz099 ◽

2019 ◽

Vol 21 (6) ◽

pp. 1971-1986 ◽

Cited By ~ 1

Author(s):

Matteo Chiara ◽

Federico Zambelli ◽

Ernesto Picardi ◽

David S Horner ◽

Graziano Pesole

Keyword(s):

Single Molecule ◽

Tandem Repeats ◽

Simulated Data ◽

Detailed Comparison ◽

Sequencing Data ◽

Single Molecule Sequencing ◽

Sequencing Technologies ◽

Repeat Expansions

Abstract A number of studies have reported the successful application of single-molecule sequencing technologies to the determination of the size and sequence of pathological expanded microsatellite repeats over the last 5 years. However, different custom bioinformatics pipelines were employed in each study, preventing meaningful comparisons and somewhat limiting the reproducibility of the results. In this review, we provide a brief summary of state-of-the-art methods for the characterization of expanded repeats alleles, along with a detailed comparison of bioinformatics tools for the determination of repeat length and sequence, using both real and simulated data. Our reanalysis of publicly available human genome sequencing data suggests a modest, but statistically significant, increase of the error rate of single-molecule sequencing technologies at genomic regions containing short tandem repeats. However, we observe that all the methods herein tested, irrespective of the strategy used for the analysis of the data (either based on the alignment or assembly of the reads), show high levels of sensitivity in both the detection of expanded tandem repeats and the estimation of the expansion size, suggesting that approaches based on single-molecule sequencing technologies are highly effective for the detection and quantification of tandem repeat expansions and contractions.

Download Full-text

Next-generation polyploid phylogenetics: rapid resolution of hybrid polyploid complexes using PacBio single-molecule sequencing

New Phytologist ◽

10.1111/nph.14111 ◽

2016 ◽

Vol 213 (1) ◽

pp. 413-429 ◽

Cited By ~ 34

Author(s):

Carl J. Rothfels ◽

Kathleen M. Pryer ◽

Fay-Wei Li

Keyword(s):

Single Molecule ◽

Next Generation ◽

Single Molecule Sequencing ◽

Rapid Resolution

Download Full-text

Prospects for the use of third generation sequencers for quantitative profiling of transcriptome

Biomedical Chemistry Research and Methods ◽

10.18097/bmcrm00086 ◽

2018 ◽

Vol 1 (4) ◽

pp. e00086

Author(s):

S.P. Radko ◽

L.K. Kurbatov ◽

K.G. Ptitsyn ◽

Y.Y. Kiseleva ◽

E.A. Ponomarenko ◽

...

Keyword(s):

Single Molecule ◽

Transcriptome Profiling ◽

Third Generation ◽

Sequencing Technology ◽

Single Molecule Sequencing ◽

Sequencing Technologies ◽

Oxford Nanopore ◽

Biotechnology Companies ◽

Oxford Nanopore Technologies ◽

Quantitative Profiling

Transcriptome profiling is widely employed to analyze transcriptome dynamics when studying various biological processes at the cell and tissue levels. Unlike the second generation sequencers, which sequence relatively short fragments of nucleic acids, the third generation DNA/RNA sequencers developed by biotechnology companies “PacBio” and “Oxford Nanopore Technologies” allow one to sequence transcripts as single molecules and may be considered as potential molecular counters capable to measure the number of copies of each transcript with high throughput, sensitivity, and specificity. In the present review, the features of single molecule sequencing technologies offered by “PacBio” and “Oxford Nanopore Technologies” are considered alongside with their utility for transcriptome analysis, including the analysis of transcript isoforms. The prospects and limitations of the single molecule sequencing technology in application to quantitative transcriptome profiling are also discussed.

Download Full-text

Application of cell-free DNA sequencing in characterization of bloodborne microbes and the study of microbe-disease interactions

10.7287/peerj.preprints.27588v1 ◽

2019 ◽

Author(s):

Kuo-Ping Chiu ◽

Alice L. Yu

Keyword(s):

Single Molecule ◽

Circulatory System ◽

Cell Free Dna ◽

Single Molecule Sequencing ◽

Microbial Species ◽

Strong Potential ◽

Sequencing Technologies ◽

Free Dna ◽

Next Generation Sequencing Ngs ◽

Traditional Approaches

Background. It is an important issue whether and how microorganisms can live harmoniously withnormal cells in the circulatory system. Answers to these issues will have enormous impact on medical microbiology. To address these issues, it is essential to identify and characterize the blood-borne microbes in an efficient and comprehensive manner. Methodology. Traditional approaches using PCR or microarray are not suitable for the purpose due to the complexity and composition of large amount of unknown microbial species in the circulatory system. Recent reports indicated that cell-free DNA (cfDNA) sequencing using advanced sequencing technologies, including next-generation sequencing (NGS) and single-molecule sequencing (SMS) together with associated bioinformatics approaches, possess a strong potential enabling us to address these issues at the molecular level. Results. Multiple studies using microbial cfDNA sequencing to identify microbes for septic patients have shown strong agreement with cell culture. Similar approaches have also been applied to reveal previously unidentified microorganisms or to demonstrate the feasibility of comprehensive assessment of bloodborne microorganisms for healthy and/or diseased individuals. Single-molecule sequencing (SMS) using either SMRT (single-molecule real-time) sequencing or Nanopore sequencing are providing new momentum to reinforce this line of investigations. Conclusions. Microbial cfDNA sequencing provides a novel opportunity allowing us to further understand the involvement of blood-borne microbes in development of diseases. Similar approaches should also be applicable to the study of metagenomics for sufficient and comprehensive analysis of microbial species isolated from various environments. This article reviews this line of research and discuss the methodological approaches that have been developed, or are likely to be developed in the future, which may have strong potential to facilitate cfDNA- and cfRNA-based studies of cancer and chronic diseases, in the hope that a better understanding of the hidden microbes in the circulatory system would improve the accuracy of diagnosis, prevention, and treatment of problematic diseases.

Download Full-text

Combining next-generation sequencing and single-molecule sequencing to explore brown plant hopper responses to contrasting genotypes of japonica rice

BMC Genomics ◽

10.1186/s12864-019-6049-7 ◽

2019 ◽

Vol 20 (1) ◽

Cited By ~ 4

Author(s):

Jing Zhang ◽

Wei Guan ◽

Chaomei Huang ◽

Yinxia Hu ◽

Yu Chen ◽

...

Keyword(s):

Next Generation Sequencing ◽

Single Molecule ◽

Japonica Rice ◽

Next Generation ◽

Single Molecule Sequencing ◽

Brown Plant Hopper ◽

Generation Sequencing

Download Full-text

Long single-molecule reads can resolve the complexity of the Influenza virus composed of rare, closely related mutant variants

10.1101/036392 ◽

2016 ◽

Author(s):

Alexander Artyomenko ◽

Nicholas C Wu ◽

Serghei Mangul ◽

Eleazar Eskin ◽

Ren Sun ◽

...

Keyword(s):

Single Molecule ◽

Error Rate ◽

Rna Virus ◽

High Rate ◽

Read Length ◽

Viral Population ◽

High Error Rate ◽

Single Molecule Sequencing ◽

Sequencing Technologies ◽

Viral Mutant

AbstractAs a result of a high rate of mutations and recombination events, an RNA-virus exists as a heterogeneous “swarm” of mutant variants. The long read length offered by single-molecule sequencing technologies allows each mutant variant to be sequenced in a single pass. However, high error rate limits the ability to reconstruct heterogeneous viral population composed of rare, related mutant variants. In this paper, we present 2SNV, a method able to tolerate the high error-rate of the single-molecule protocol and reconstruct mutant variants. 2SNV uses linkage between single nucleotide variations to efficiently distinguish them from read errors. To benchmark the sensitivity of 2SNV, we performed a single-molecule sequencing experiment on a sample containing a titrated level of known viral mutant variants. Our method is able to accurately reconstruct clone with frequency of 0.2% and distinguish clones that differed in only two nucleotides distantly located on the genome. 2SNV outperforms existing methods for full-length viral mutant reconstruction. The open source implementation of 2SNV is freely available for download at http://alan.cs.gsu.edu/NGS/?q=content/2snv

Download Full-text

Combining single-molecule sequencing and next-generation sequencing to provide insight into the complex response of Iris halophila Pall. to Pb exposure

Industrial Crops and Products ◽

10.1016/j.indcrop.2021.113623 ◽

2021 ◽

Vol 168 ◽

pp. 113623

Author(s):

Qingquan Liu ◽

Yongxia Zhang ◽

Yinjie Wang ◽

Chunsun Gu ◽

Suzhen Huang ◽

...

Keyword(s):

Next Generation Sequencing ◽

Single Molecule ◽

Next Generation ◽

Single Molecule Sequencing ◽

Complex Response ◽

Pb Exposure ◽

Insight Into ◽

Generation Sequencing

Download Full-text

SVIM: structural variant identification using mapped long reads

Bioinformatics ◽

10.1093/bioinformatics/btz041 ◽

2019 ◽

Vol 35 (17) ◽

pp. 2907-2915 ◽

Cited By ~ 32

Author(s):

David Heller ◽

Martin Vingron

Keyword(s):

Single Molecule ◽

Simulated Data ◽

Supplementary Information ◽

Nucleotide Polymorphisms ◽

Structural Variants ◽

Human Phenotype ◽

Structural Variant ◽

Pacific Biosciences ◽

Sequencing Technologies ◽

Long Read

Abstract Motivation Structural variants are defined as genomic variants larger than 50 bp. They have been shown to affect more bases in any given genome than single-nucleotide polymorphisms or small insertions and deletions. Additionally, they have great impact on human phenotype and diversity and have been linked to numerous diseases. Due to their size and association with repeats, they are difficult to detect by shotgun sequencing, especially when based on short reads. Long read, single-molecule sequencing technologies like those offered by Pacific Biosciences or Oxford Nanopore Technologies produce reads with a length of several thousand base pairs. Despite the higher error rate and sequencing cost, long-read sequencing offers many advantages for the detection of structural variants. Yet, available software tools still do not fully exploit the possibilities. Results We present SVIM, a tool for the sensitive detection and precise characterization of structural variants from long-read data. SVIM consists of three components for the collection, clustering and combination of structural variant signatures from read alignments. It discriminates five different variant classes including similar types, such as tandem and interspersed duplications and novel element insertions. SVIM is unique in its capability of extracting both the genomic origin and destination of duplications. It compares favorably with existing tools in evaluations on simulated data and real datasets from Pacific Biosciences and Nanopore sequencing machines. Availability and implementation The source code and executables of SVIM are available on Github: github.com/eldariont/svim. SVIM has been implemented in Python 3 and published on bioconda and the Python Package Index. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text