MECAT: fast mapping, error correction, and de novo assembly for single-molecule sequencing reads

Chuan-Le Xiao; Ying Chen; Shang-Qian Xie; Kai-Ning Chen; Yan Wang; Yue Han; Feng Luo; Zhi Xie

doi:10.1038/nmeth.4432

Hybrid error correction and de novo assembly of single-molecule sequencing reads

Nature Biotechnology ◽

10.1038/nbt.2280 ◽

2012 ◽

Vol 30 (7) ◽

pp. 693-700 ◽

Cited By ~ 699

Author(s):

Sergey Koren ◽

Michael C Schatz ◽

Brian P Walenz ◽

Jeffrey Martin ◽

Jason T Howard ◽

...

Keyword(s):

Error Correction ◽

Single Molecule ◽

De Novo Assembly ◽

De Novo ◽

Single Molecule Sequencing

Download Full-text

De novo assembly of the cattle reference genome with single-molecule sequencing

GigaScience ◽

10.1093/gigascience/giaa021 ◽

2020 ◽

Vol 9 (3) ◽

Cited By ~ 35

Author(s):

Benjamin D Rosen ◽

Derek M Bickhart ◽

Robert D Schnabel ◽

Sergey Koren ◽

Christine G Elsik ◽

...

Keyword(s):

Single Molecule ◽

De Novo Assembly ◽

Reference Genome ◽

De Novo ◽

Bos Taurus ◽

Future Research ◽

Protein Coding ◽

Single Molecule Sequencing ◽

Assembly Accuracy ◽

Genomic Tools

Abstract Background Major advances in selection progress for cattle have been made following the introduction of genomic tools over the past 10–12 years. These tools depend upon the Bos taurus reference genome (UMD3.1.1), which was created using now-outdated technologies and is hindered by a variety of deficiencies and inaccuracies. Results We present the new reference genome for cattle, ARS-UCD1.2, based on the same animal as the original to facilitate transfer and interpretation of results obtained from the earlier version, but applying a combination of modern technologies in a de novo assembly to increase continuity, accuracy, and completeness. The assembly includes 2.7 Gb and is >250× more continuous than the original assembly, with contig N50 >25 Mb and L50 of 32. We also greatly expanded supporting RNA-based data for annotation that identifies 30,396 total genes (21,039 protein coding). The new reference assembly is accessible in annotated form for public use. Conclusions We demonstrate that improved continuity of assembled sequence warrants the adoption of ARS-UCD1.2 as the new cattle reference genome and that increased assembly accuracy will benefit future research on this species.

Download Full-text

De Novo Assembly of the Streptomyces sp. Strain Mg1 Genome Using PacBio Single-Molecule Sequencing

Genome Announcements ◽

10.1128/genomea.00535-13 ◽

2013 ◽

Vol 1 (4) ◽

Cited By ~ 17

Author(s):

B. C. Hoefler ◽

K. Konganti ◽

P. D. Straight

Keyword(s):

Single Molecule ◽

De Novo Assembly ◽

De Novo ◽

Single Molecule Sequencing ◽

Streptomyces Sp

Download Full-text

lra: A long read aligner for sequences and contigs

PLoS Computational Biology ◽

10.1371/journal.pcbi.1009078 ◽

2021 ◽

Vol 17 (6) ◽

pp. e1009078

Author(s):

Jingwen Ren ◽

Mark J. P. Chaisson

Keyword(s):

Dynamic Programming ◽

Single Molecule ◽

De Novo Assembly ◽

De Novo ◽

Concave Function ◽

Single Molecule Sequencing ◽

Link Type ◽

Oxford Nanopore ◽

Concave Cost ◽

Long Read

It is computationally challenging to detect variation by aligning single-molecule sequencing (SMS) reads, or contigs from SMS assemblies. One approach to efficiently align SMS reads is sparse dynamic programming (SDP), where optimal chains of exact matches are found between the sequence and the genome. While straightforward implementations of SDP penalize gaps with a cost that is a linear function of gap length, biological variation is more accurately represented when gap cost is a concave function of gap length. We have developed a method, lra, that uses SDP with a concave-cost gap penalty, and used lra to align long-read sequences from PacBio and Oxford Nanopore (ONT) instruments as well as de novo assembly contigs. This alignment approach increases sensitivity and specificity for SV discovery, particularly for variants above 1kb and when discovering variation from ONT reads, while having runtime that are comparable (1.05-3.76×) to current methods. When applied to calling variation from de novo assembly contigs, there is a 3.2% increase in Truvari F1 score compared to minimap2+htsbox. lra is available in bioconda (https://anaconda.org/bioconda/lra) and github (https://github.com/ChaissonLab/LRA).

Download Full-text

SMARTdenovo: a de novo assembler using long noisy reads

Gigabyte ◽

10.46471/gigabyte.15 ◽

2021 ◽

Vol 2021 ◽

pp. 1-9

Author(s):

Hailin Liu ◽

Shigang Wu ◽

Alun Li ◽

Jue Ruan

Keyword(s):

Error Correction ◽

Single Molecule ◽

Genome Assembly ◽

De Novo ◽

Structural Variants ◽

High Quality ◽

Single Molecule Sequencing ◽

Long Read ◽

Reference Quality

Long-read single-molecule sequencing has revolutionized de novo genome assembly and enabled the automated reconstruction of reference-quality genomes. It has also been widely used to study structural variants, phase haplotypes and more. Here, we introduce the assembler SMARTdenovo, a single-molecule sequencing (SMS) assembler that follows the overlap-layout-consensus (OLC) paradigm. SMARTdenovo (RRID: SCR_017622) was designed to be a rapid assembler, which, unlike contemporaneous SMS assemblers, does not require highly accurate raw reads for error correction. It has performed well in the evaluation of congeneric assemblers and has been successfully users for various assembly projects. It is compatible with Canu for assembling high-quality genomes, and several of the assembly strategies in this program have been incorporated into subsequent popular assemblers. The assembler has been in use since 2015; here we provide information on the development of SMARTdenovo and how to implement its algorithms into current projects.

Download Full-text

De novo assembly of a young Drosophila Y chromosome using single-molecule sequencing and chromatin conformation capture

PLoS Biology ◽

10.1371/journal.pbio.2006348 ◽

2018 ◽

Vol 16 (7) ◽

pp. e2006348 ◽

Cited By ~ 42

Author(s):

Shivani Mahajan ◽

Kevin H.-C. Wei ◽

Matthew J. Nalley ◽

Lauren Gibilisco ◽

Doris Bachtrog

Keyword(s):

Y Chromosome ◽

Single Molecule ◽

De Novo Assembly ◽

De Novo ◽

Chromatin Conformation ◽

Single Molecule Sequencing ◽

Chromatin Conformation Capture

Download Full-text

SMARTdenovo: A de novo Assembler Using Long Noisy Reads

10.20944/preprints202009.0207.v1 ◽

2020 ◽

Cited By ~ 1

Author(s):

Hailin Liu ◽

Shigang Wu ◽

Alun Li ◽

Jue Ruan

Keyword(s):

Error Correction ◽

Single Molecule ◽

Genome Assembly ◽

De Novo ◽

Structural Variants ◽

High Quality ◽

De Novo Genome Assembly ◽

Single Molecule Sequencing ◽

Long Read ◽

Reference Quality

Long-read single-molecule sequencing has revolutionized de novo genome assembly and enabled the automated reconstruction of reference-quality genomes. It also has been widely used to study structural variants, phase haplotypes and more. Here, we introduce the assembler— SMARTdenovo, which is an SMS assembler that follows the overlap-layout-consensus (OLC) paradigm. SMARTdenovo (RRID: SCR_017622) was designed to be a fast assembler that did not require highly accurate raw reads for error correction, unlike other, contemporaneous SMS assemblers. It has performed well for evaluating congeneric assemblers and has been successful for a variety of assembly projects. It is compatible with Canu for assembling high-quality genomes, and several of the assembly strategies in this program have been incorporated into subsequent popular assemblers. The assembler has been in use since 2015, and here we provide information on the development of SMARTdenovo and how to implement its algorithms into current projects.

Download Full-text

Oxford Nanopore Sequencing, Hybrid Error Correction, and de novo Assembly of a Eukaryotic Genome

10.1101/013490 ◽

2015 ◽

Cited By ~ 23

Author(s):

Sara Goodwin ◽

James Gurtowski ◽

Scott Ethe-Sayers ◽

Panchajanya Deshpande ◽

Michael Schatz ◽

...

Keyword(s):

Error Correction ◽

De Novo Assembly ◽

De Novo ◽

Correction Algorithm ◽

Membrane Pore ◽

Complete Representation ◽

Oxford Nanopore ◽

Long Read ◽

Error Correction Algorithm ◽

Sequencing Instrument

Monitoring the progress of DNA molecules through a membrane pore has been postulated as a method for sequencing DNA for several decades. Recently, a nanopore-based sequencing instrument, the Oxford Nanopore MinION, has become available that we used for sequencing the S. cerevisiae genome. To make use of these data, we developed a novel open-source hybrid error correction algorithm Nanocorr (https://github.com/jgurtowski/nanocorr) specifically for Oxford Nanopore reads, as existing packages were incapable of assembling the long read lengths (5-50kbp) at such high error rate (between ~5 and 40% error). With this new method we were able to perform a hybrid error correction of the nanopore reads using complementary MiSeq data and produce a de novo assembly that is highly contiguous and accurate: the contig N50 length is more than ten-times greater than an Illumina-only assembly (678kb versus 59.9kbp), and has greater than 99.88% consensus identity when compared to the reference. Furthermore, the assembly with the long nanopore reads presents a much more complete representation of the features of the genome and correctly assembles gene cassettes, rRNAs, transposable elements, and other genomic features that were almost entirely absent in the Illumina-only assembly.

Download Full-text

Hybrid error correction approach and de novo assembly for minion sequencing long reads

2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) ◽

10.1109/bibm.2016.7822504 ◽

2016 ◽

Author(s):

Mehdi Kchouk ◽

Mourad Elloumi

Keyword(s):

Error Correction ◽

De Novo Assembly ◽

De Novo ◽

Long Reads

Download Full-text

De Novo Assembly of the Quorum-Sensing Pandoraea sp. Strain RB-44 Complete Genome Sequence Using PacBio Single-Molecule Real-Time Sequencing Technology

Genome Announcements ◽

10.1128/genomea.00245-14 ◽

2014 ◽

Vol 2 (2) ◽

Cited By ~ 8

Author(s):

R. Ee ◽

Y.-L. Lim ◽

W.-F. Yin ◽

K.-G. Chan

Keyword(s):

Quorum Sensing ◽

Real Time ◽

Genome Sequence ◽

Single Molecule ◽

Complete Genome Sequence ◽

De Novo Assembly ◽

Complete Genome ◽

De Novo ◽

Sequencing Technology

Download Full-text