DBG2OLC: Efficient Assembly of Large Genomes Using Long Erroneous Reads of the Third Generation Sequencing Technologies

Chengxi Ye; Christopher M. Hill; Shigang Wu; Jue Ruan; Zhanshan (Sam) Ma

doi:10.1038/srep31900

Apollo: a sequencing-technology-independent, scalable and accurate assembly polishing algorithm

Bioinformatics ◽

10.1093/bioinformatics/btaa179 ◽

2020 ◽

Vol 36 (12) ◽

pp. 3669-3679 ◽

Cited By ~ 3

Author(s):

Can Firtina ◽

Jeremie S Kim ◽

Mohammed Alser ◽

Damla Senol Cali ◽

A Ercument Cicek ◽

...

Keyword(s):

Genome Analysis ◽

Supplementary Information ◽

Third Generation ◽

Sequencing Technology ◽

Base Pairs ◽

Sequencing Technologies ◽

Third Generation Sequencing ◽

Long Reads ◽

Generation Sequencing ◽

Large Genomes

Abstract Motivation Third-generation sequencing technologies can sequence long reads that contain as many as 2 million base pairs. These long reads are used to construct an assembly (i.e. the subject’s genome), which is further used in downstream genome analysis. Unfortunately, third-generation sequencing technologies have high sequencing error rates and a large proportion of base pairs in these long reads is incorrectly identified. These errors propagate to the assembly and affect the accuracy of genome analysis. Assembly polishing algorithms minimize such error propagation by polishing or fixing errors in the assembly by using information from alignments between reads and the assembly (i.e. read-to-assembly alignment information). However, current assembly polishing algorithms can only polish an assembly using reads from either a certain sequencing technology or a small assembly. Such technology-dependency and assembly-size dependency require researchers to (i) run multiple polishing algorithms and (ii) use small chunks of a large genome to use all available readsets and polish large genomes, respectively. Results We introduce Apollo, a universal assembly polishing algorithm that scales well to polish an assembly of any size (i.e. both large and small genomes) using reads from all sequencing technologies (i.e. second- and third-generation). Our goal is to provide a single algorithm that uses read sets from all available sequencing technologies to improve the accuracy of assembly polishing and that can polish large genomes. Apollo (i) models an assembly as a profile hidden Markov model (pHMM), (ii) uses read-to-assembly alignment to train the pHMM with the Forward–Backward algorithm and (iii) decodes the trained model with the Viterbi algorithm to produce a polished assembly. Our experiments with real readsets demonstrate that Apollo is the only algorithm that (i) uses reads from any sequencing technology within a single run and (ii) scales well to polish large assemblies without splitting the assembly into multiple parts. Availability and implementation Source code is available at https://github.com/CMU-SAFARI/Apollo. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Detection of causative agent of infectious abortion in cattle by the Third Generation Sequencing Technique (Oxford Nanopore Technology)

Veterinary Medicine Journal ◽

10.30896/0042-4846.2021.24.2.20-26 ◽

2021 ◽

Vol 24 (02) ◽

pp. 20-26

Author(s):

S.S. Zaitsev ◽

◽

M.A. Khizhnyakova ◽

V.A. Feodorova ◽

◽

...

Keyword(s):

Causative Agent ◽

Third Generation ◽

The Third ◽

Third Generation Sequencing ◽

Oxford Nanopore ◽

Sequencing Technique ◽

Generation Sequencing

Download Full-text

de novo repeat detection based on the third generation sequencing reads

2019 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) ◽

10.1109/bibm47256.2019.8982959 ◽

2019 ◽

Cited By ~ 1

Author(s):

Xingyu Liao ◽

Xiankai Zhang ◽

Fang-Xiang Wu ◽

Jianxin Wang

Keyword(s):

De Novo ◽

Third Generation ◽

The Third ◽

Third Generation Sequencing ◽

Generation Sequencing ◽

Repeat Detection

Download Full-text

Moving Towards Third-Generation Sequencing Technologies

Tag-Based Next Generation Sequencing ◽

10.1002/9783527644582.ch20 ◽

2012 ◽

pp. 323-336 ◽

Cited By ~ 1

Author(s):

Karolina Janitz ◽

Michal Janitz

Keyword(s):

Third Generation ◽

Sequencing Technologies ◽

Third Generation Sequencing ◽

Generation Sequencing

Download Full-text

The study of transcriptomes of symbiotic tissue of pea using the third-generation sequencing technology Oxford Nanopore

Abstract book of the 2nd International Scientific Conference "Plants and Microbes: the Future of Biotechnology" PLAMIC2020 ◽

10.28983/plamic2020.093 ◽

2020 ◽

Author(s):

E. S. Gribchenko

Keyword(s):

Nanopore Sequencing ◽

Third Generation ◽

Nitrogen Fixing ◽

Sequencing Technology ◽

The Third ◽

Third Generation Sequencing ◽

Oxford Nanopore ◽

Mycorrhizal Roots ◽

Gene Isoforms ◽

Generation Sequencing

The transcriptome profiles the cv. Frisson mycorrhizal roots and inoculated nitrogen-fixing nodules were investigated using the Oxford Nanopore sequencing technology. A database of gene isoforms and their expression has been created.

Download Full-text

RNA Transcriptome Mapping with GraphMap

10.1101/160085 ◽

2017 ◽

Cited By ~ 1

Author(s):

Krešimir Križanović ◽

Ivan Sović ◽

Ivan Krpelnik ◽

Mile Šikić

Keyword(s):

Third Generation ◽

Sequencing Data ◽

Mapping Algorithm ◽

Gene Annotations ◽

Sequencing Technologies ◽

Third Generation Sequencing ◽

Oxford Nanopore ◽

Rna Mapping ◽

Synthetic Datasets ◽

Generation Sequencing

AbstractNext generation sequencing technologies have made RNA sequencing widely accessible and applicable in many areas of research. In recent years, 3rd generation sequencing technologies have matured and are slowly replacing NGS for DNA sequencing. This paper presents a novel tool for RNA mapping guided by gene annotations. The tool is an adapted version of a previously developed DNA mapper – GraphMap, tailored for third generation sequencing data, such as those produced by Pacific Biosciences or Oxford Nanopore Technologies devices. It uses gene annotations to generate a transcriptome, uses a DNA mapping algorithm to map reads to the transcriptome, and finally transforms the mappings back to genome coordinates. Modified version of GraphMap is compared on several synthetic datasets to the state-of-the-art RNAseq mappers enabled to work with third generation sequencing data. The results show that our tool outperforms other tools in general mapping quality.

Download Full-text

Oxford Nanopore sequencing: new opportunities for plant genomics?

Journal of Experimental Botany ◽

10.1093/jxb/eraa263 ◽

2020 ◽

Vol 71 (18) ◽

pp. 5313-5322 ◽

Cited By ~ 2

Author(s):

Kathryn Dumschott ◽

Maximilian H-W Schmidt ◽

Harmeet Singh Chawla ◽

Rod Snowdon ◽

Björn Usadel

Keyword(s):

Plant Genome ◽

Third Generation ◽

Plant Genomics ◽

High Coverage ◽

Plant Genomes ◽

Sequencing Technologies ◽

Third Generation Sequencing ◽

Oxford Nanopore ◽

Long Read ◽

Generation Sequencing

Abstract DNA sequencing was dominated by Sanger’s chain termination method until the mid-2000s, when it was progressively supplanted by new sequencing technologies that can generate much larger quantities of data in a shorter time. At the forefront of these developments, long-read sequencing technologies (third-generation sequencing) can produce reads that are several kilobases in length. This greatly improves the accuracy of genome assemblies by spanning the highly repetitive segments that cause difficulty for second-generation short-read technologies. Third-generation sequencing is especially appealing for plant genomes, which can be extremely large with long stretches of highly repetitive DNA. Until recently, the low basecalling accuracy of third-generation technologies meant that accurate genome assembly required expensive, high-coverage sequencing followed by computational analysis to correct for errors. However, today’s long-read technologies are more accurate and less expensive, making them the method of choice for the assembly of complex genomes. Oxford Nanopore Technologies (ONT), a third-generation platform for the sequencing of native DNA strands, is particularly suitable for the generation of high-quality assemblies of highly repetitive plant genomes. Here we discuss the benefits of ONT, especially for the plant science community, and describe the issues that remain to be addressed when using ONT for plant genome sequencing.

Download Full-text

Third generation sequencing technologies applied to diagnostic microbiology: benefits and challenges in applications and data analysis

Expert Review of Molecular Diagnostics ◽

10.1080/14737159.2016.1217158 ◽

2016 ◽

Vol 16 (9) ◽

pp. 1011-1023 ◽

Cited By ~ 18

Author(s):

Enrico Lavezzo ◽

Luisa Barzon ◽

Stefano Toppo ◽

Giorgio Palù

Keyword(s):

Data Analysis ◽

Third Generation ◽

Sequencing Technologies ◽

Third Generation Sequencing ◽

Diagnostic Microbiology ◽

Generation Sequencing

Download Full-text

A hybrid correcting method considering heterozygous variations by a comprehensive probabilistic model

BMC Genomics ◽

10.1186/s12864-020-07008-9 ◽

2020 ◽

Vol 21 (S10) ◽

Author(s):

Jiaqi Liu ◽

Jiayin Wang ◽

Xiao Xiao ◽

Xin Lai ◽

Daocheng Dai ◽

...

Keyword(s):

Error Correction ◽

Correction Method ◽

Reference Sequence ◽

Third Generation ◽

Next Generation ◽

Sequencing Data ◽

Sequencing Errors ◽

The Third ◽

Third Generation Sequencing ◽

Generation Sequencing

Abstract Background The emergence of the third generation sequencing technology, featuring longer read lengths, has demonstrated great advancement compared to the next generation sequencing technology and greatly promoted the biological research. However, the third generation sequencing data has a high level of the sequencing error rates, which inevitably affects the downstream analysis. Although the issue of sequencing error has been improving these years, large amounts of data were produced at high sequencing errors, and huge waste will be caused if they are discarded. Thus, the error correction for the third generation sequencing data is especially important. The existing error correction methods have poor performances at heterozygous sites, which are ubiquitous in diploid and polyploidy organisms. Therefore, it is a lack of error correction algorithms for the heterozygous loci, especially at low coverages. Results In this article, we propose a error correction method, named QIHC. QIHC is a hybrid correction method, which needs both the next generation and third generation sequencing data. QIHC greatly enhances the sensitivity of identifying the heterozygous sites from sequencing errors, which leads to a high accuracy on error correction. To achieve this, QIHC established a set of probabilistic models based on Bayesian classifier, to estimate the heterozygosity of a site and makes a judgment by calculating the posterior probabilities. The proposed method is consisted of three modules, which respectively generates a pseudo reference sequence, obtains the read alignments, estimates the heterozygosity the sites and corrects the read harboring them. The last module is the core module of QIHC, which is designed to fit for the calculations of multiple cases at a heterozygous site. The other two modules enable the reads mapping to the pseudo reference sequence which somehow overcomes the inefficiency of multiple mappings that adopt by the existing error correction methods. Conclusions To verify the performance of our method, we selected Canu and Jabba to compare with QIHC in several aspects. As a hybrid correction method, we first conducted a groups of experiments under different coverages of the next-generation sequencing data. QIHC is far ahead of Jabba on accuracy. Meanwhile, we varied the coverages of the third generation sequencing data and compared performances again among Canu, Jabba and QIHC. QIHC outperforms the other two methods on accuracy of both correcting the sequencing errors and identifying the heterozygous sites, especially at low coverage. We carried out a comparison analysis between Canu and QIHC on the different error rates of the third generation sequencing data. QIHC still performs better. Therefore, QIHC is superior to the existing error correction methods when heterozygous sites exist.

Download Full-text

Detecting complex indels with wide length-spectrum from the third generation sequencing data

2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) ◽

10.1109/bibm.2017.8217965 ◽

2017 ◽

Author(s):

Xuanping Zhang ◽

Hengwei Chen ◽

Rong Zhang ◽

Jingwen Pei ◽

Yixuan Wang ◽

...

Keyword(s):

Length Spectrum ◽

Third Generation ◽

Sequencing Data ◽

The Third ◽

Third Generation Sequencing ◽

Generation Sequencing

Download Full-text