poreTally: run and publish de novo Nanopore assembler benchmarks

Mapping Intimacies ◽

10.1101/424184 ◽

2018 ◽

Author(s):

Carlos de Lannoy ◽

Judith Risse ◽

Dick de Ridder

Keyword(s):

Best Practices ◽

De Novo ◽

Nanopore Sequencing ◽

Base Calling ◽

Novel Approach ◽

Tool Performance ◽

Assembly Pipeline ◽

Nucleic Acid Analysis ◽

Sequencing Platforms ◽

Assembly Tool

AbstractNanopore sequencing is a novel approach to nucleic acid analysis that generates long, error-prone reads. Since device components, base calling software and best practices for sample preparation are updated frequently and extensively, the nature of the produced data also changes frequently. As a result, peer-reviewed publications on de novo assembly pipeline benchmarking efforts are quickly rendered outdated by the next major improvement to the sequencing platforms. To provide the user community with a faster, more flexible alternative to peer-reviewed benchmark papers for de novo assembly tool performance we constructed poreTally, a comprehensive benchmarking tool. poreTally automatically assembles a given read set using several often-used assembly pipelines, analyzes the resulting assemblies for correctness and continuity, and finally generates a quality report. Results can immediately be shared with peers in a Github/Gitlab repository. Furthermore, we aim to give a more inclusive overview of assembly pipeline performance than any individual research group can, by offering users the possibility to submit their results to a collective benchmarking effort. poreTally is available on Github.

Download Full-text

poreTally: run and publish de novo nanopore assembler benchmarks

Bioinformatics ◽

10.1093/bioinformatics/bty1045 ◽

2018 ◽

Vol 35 (15) ◽

pp. 2663-2664 ◽

Cited By ~ 2

Author(s):

Carlos de Lannoy ◽

Judith Risse ◽

Dick de Ridder

Keyword(s):

Nucleic Acid ◽

De Novo Assembly ◽

De Novo ◽

Supplementary Information ◽

Nanopore Sequencing ◽

Supplementary Data ◽

Analysis Pipeline ◽

Tool Performance ◽

Nucleic Acid Analysis ◽

Assembly Tool

Abstract Summary Nanopore sequencing is a novel development in nucleic acid analysis. As such, nanopore-sequencing hardware and software are updated frequently and extensively, which quickly renders peer-reviewed publications on analysis pipeline benchmarking efforts outdated. To provide the user community with a faster, more flexible alternative to peer-reviewed benchmark papers for de novo assembly tool performance we constructed poreTally, a comprehensive benchmarking tool. poreTally automatically assembles a given read set using several often-used assembly pipelines, analyzes the resulting assemblies for correctness and continuity, and finally generates a quality report, which can immediately be published on Github/Gitlab. Availability and implementation poreTally is available on Github at https://github.com/ cvdelannoy/poreTally, under an MIT license. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

DeepSimulator: a deep simulator for Nanopore sequencing

10.1101/238683 ◽

2017 ◽

Cited By ~ 1

Author(s):

Yu Li ◽

Renmin Han ◽

Chongwei Bi ◽

Mo Li ◽

Sheng Wang ◽

...

Keyword(s):

Deep Learning ◽

De Novo ◽

Real Data ◽

Electrical Current ◽

Complex Nature ◽

Main Task ◽

Nanopore Sequencing ◽

Base Calling ◽

Context Dependent ◽

Sequencing Procedure

ABSTRACTMotivationOxford Nanopore sequencing is a rapidly developed sequencing technology in recent years. To keep pace with the explosion of the downstream data analytical tools, a versatile Nanopore sequencing simulator is needed to complement the experimental data as well as to benchmark those newly developed tools. However, all the currently available simulators are based on simple statistics of the produced reads, which have difficulty in capturing the complex nature of the Nanopore sequencing procedure, the main task of which is the generation of raw electrical current signals.ResultsHere we propose a deep learning based simulator, DeepSimulator, to mimic the entire pipeline of Nanopore sequencing. Starting from a given reference genome or assembled contigs, we simulate the electrical current signals by a context-dependent deep learning model, followed by a base-calling procedure to yield simulated reads. This workflow mimics the sequencing procedure more naturally. The thorough experiments performed across four species show that the signals generated by our context-dependent model are more similar to the experimentally obtained signals than the ones generated by the official context-independent pore model. In terms of the simulated reads, we provide a parameter interface to users so that they can obtain the reads with different accuracies ranging from 83% to 97%. The reads generated by the default parameter have almost the same properties as the real data. Two case studies demonstrate the application of DeepSimulator to benefit the development of tools in de novo assembly and in low coverage SNP detection.AvailabilityThe software can be accessed freely at: https://github.com/lykaust15/deep_simulator.

Download Full-text

Quantitative profiling of native RNA modifications and their dynamics using nanopore sequencing

10.1101/2020.07.06.189969 ◽

2020 ◽

Author(s):

Oguzhan Begik ◽

Morghan C Lucas ◽

Leszek P Pryszcz ◽

Jose Miguel Ramirez ◽

Rebeca Medina ◽

...

Keyword(s):

De Novo ◽

Rna Modification ◽

Nanopore Sequencing ◽

Rna Modifications ◽

General Belief ◽

Rna Molecules ◽

Base Calling ◽

Genetic Strains ◽

Quantitative Manner ◽

External Signals

ABSTRACTA broad diversity of modifications decorate RNA molecules. Originally conceived as static components, evidence is accumulating that some RNA modifications may be dynamic, contributing to cellular responses to external signals and environmental circumstances. A major difficulty in studying these modifications, however, is the need of tailored protocols to map each modification type individually. Here, we present a new approach that uses direct RNA nanopore sequencing to identify and quantify RNA modifications present in native RNA molecules. First, we show that each RNA modification type results in a distinct and characteristic base-calling ‘error’ signature, which we validate using a battery of genetic strains lacking either pseudouridine (Y) or 2’-O-methylation (Nm) modifications. We then demonstrate the value of these signatures for de novo prediction of Y modifications transcriptome-wide, confirming known Y-modified sites as well as uncovering novel Y sites in mRNAs, ncRNAs and rRNAs, including a previously unreported Pus4-dependent Y modification in yeast mitochondrial rRNA, which we validate using orthogonal methods. To explore the dynamics of pseudouridylation across environmental stresses, we treat the cells with oxidative, cold and heat stresses, finding that yeast ribosomal rRNA modifications do not change upon environmental exposures, contrary to the general belief. By contrast, our method reveals many novel heat-sensitive Y-modified sites in snRNAs, snoRNAs and mRNAs, in addition to recovering previously reported sites. Finally, we develop a novel software, nanoRMS, which we show can estimate per-site modification stoichiometries from individual RNA molecules by identifying the reads with altered current intensity and trace profiles, and quantify the RNA modification stoichiometry changes between two conditions. Our work demonstrates that Y RNA modifications can be predicted de novo and in a quantitative manner using native RNA nanopore sequencing.

Download Full-text

Highly accurate barcode and UMI error correction using dual nucleotide dimer blocks allows direct single-cell nanopore transcriptome sequencing

10.1101/2021.01.18.427145 ◽

2021 ◽

Author(s):

Martin Philpott ◽

Jonathan Watson ◽

Anjan Thakurta ◽

Tom Brown ◽

...

Keyword(s):

Single Cell ◽

Nanopore Sequencing ◽

Short Read ◽

Short Read Sequencing ◽

Single Cell Sequencing ◽

Base Calling ◽

Novel Approach ◽

Long Read ◽

First Time ◽

Insight Into

AbstractDroplet-based single-cell sequencing techniques have provided unprecedented insight into cellular heterogeneities within tissues. However, these approaches only allow for the measurement of the distal parts of a transcript following short-read sequencing. Therefore, splicing and sequence diversity information is lost for the majority of the transcript. The application of long-read Nanopore sequencing to droplet-based methods is challenging because of the low base-calling accuracy currently associated with Nanopore sequencing. Although several approaches that use additional short-read sequencing to error-correct the barcode and UMI sequences have been developed, these techniques are limited by the requirement to sequence a library using both short- and long-read sequencing. Here we introduce a novel approach termed single-cell Barcode UMI Correction sequencing (scBUC-seq) to efficiently error-correct barcode and UMI oligonucleotide sequences synthesized by using blocks of dimeric nucleotides. The method can be applied to correct either short-read or long-read sequencing, thereby allowing users to recover more reads per cell and permits direct single-cell Nanopore sequencing for the first time. We illustrate our method by using species-mixing experiments to evaluate barcode assignment accuracy and evaluate differential isoform usage and fusion transcripts using myeloma and sarcoma cell line models.

Download Full-text

A de novo DNA Sequencing and Variant Calling Algorithm for Nanopores

10.1101/019448 ◽

2015 ◽

Author(s):

Tamas Szalay ◽

Jene A Golovchenko

Keyword(s):

Single Molecule ◽

Statistical Models ◽

De Novo ◽

Variant Calling ◽

High Accuracy ◽

Nanopore Sequencing ◽

M13 Bacteriophage ◽

Assembly Pipeline ◽

Calling Algorithm ◽

Novel Algorithm

The single-molecule accuracy of nanopore sequencing has been an area of rapid academic and commercial advancement, but remains insufficient for the de novo analysis of genomes. We introduce here a novel algorithm for the error correction of nanopore data, utilizing statistical models of the physical system in order to obtain high accuracy de novo sequences at a range of coverage depths. We demonstrate the technique by sequencing M13 bacteriophage DNA to 99% accuracy at moderate coverage as well as its use in an assembly pipeline by sequencing λ DNA at a range of coverages. We also show the algorithm’s ability to accurately classify sequence variants at far lower coverage than existing methods.

Download Full-text

NERD-seq: A novel approach of Nanopore direct RNA sequencing that expands representation of non-coding RNAs

10.1101/2021.05.06.442990 ◽

2021 ◽

Author(s):

Luke Saville ◽

Yubo Cheng ◽

Babita Gollen ◽

Liam Mitchell ◽

Matthew Stuart-Edwards ◽

...

Keyword(s):

Rna Sequencing ◽

Standard Approach ◽

Rna Seq ◽

Rna Modifications ◽

Current Standard ◽

Base Calling ◽

Novel Approach ◽

Oxford Nanopore ◽

Non Coding Rnas ◽

Sequencing Platforms

The new next-generation sequencing platforms by Oxford Nanopore Technologies for direct RNA sequencing (direct RNA-seq) allow for an in-depth and comprehensive study of the epitranscriptome by enabling direct base calling of RNA modifications. Non-coding RNAs constitute the most frequently documented targets for RNA modifications. However, the current standard direct RNA-seq approach is unable to detect many of these RNAs. Here we present NERD-seq, a sequencing approach which enables the detection of multiple classes of non-coding RNAs excluded by the current standard approach. Using total RNA from a tissue with high known transcriptional and non-coding RNA activity in mouse, the brain hippocampus, we show that, in addition to detecting polyadenylated coding and non-coding transcripts as the standard approach does, NERD-seq is able to significantly expand the representation for other classes of RNAs such as snoRNAs, snRNAs, scRNAs, srpRNAs, tRNAs, rRFs and non-coding RNAs originating from LINE L1 elements. Thus, NERD-seq presents a new comprehensive direct RNA-seq approach for the study of epitranscriptomes in brain tissues and beyond.

Download Full-text

A Plug-and-Play Approach for the De Novo Generation of Dually Functionalised Bispecifics

10.26434/chemrxiv.8068184.v1 ◽

2019 ◽

Author(s):

Antoine Maruani ◽

Peter A. Szijj ◽

Calise Bahou ◽

João C. F. Nogueira ◽

Stephen Caddick ◽

...

Keyword(s):

De Novo ◽

Therapeutic Index ◽

Antibody Fragments ◽

Bispecific Antibodies ◽

Full Potential ◽

Mechanisms Of Resistance ◽

Large Excess ◽

Chemical Methods ◽

New Class ◽

Novel Approach

<p>Diseases are multifactorial, with redundancies and synergies between various pathways. However, most of the antibody-based therapeutics in clinical trials and on the market interact with only one target thus limiting their efficacy. The targeting of multiple epitopes could improve the therapeutic index of treatment and counteract mechanisms of resistance. To this effect, a new class of therapeutics emerged: bispecific antibodies.</p><p>Bispecific formation using chemical methods is rare and low yielding and/or requires a large excess of one of the two proteins to avoid homodimerisation. In order for chemically prepared bispecifics to deliver their full potential, high-yielding, modular and reliable cross-linking technologies are required. Herein, we describe a novel approach not only for the rapid and high-yielding chemical generation of bispecific antibodies from native antibody fragments, but also for the site-specific dual functionalisation of the resulting bioconjugates. Based on orthogonal clickable functional groups, this strategy enables the assembly of functionalised bispecifics with controlled loading in a modular and convergent manner.</p>

Download Full-text

Nanopore sequencing technology and tools for genome assembly: computational analysis of the current state, bottlenecks and future directions

Briefings in Bioinformatics ◽

10.1093/bib/bby017 ◽

2018 ◽

Vol 20 (4) ◽

pp. 1542-1559 ◽

Cited By ~ 44

Author(s):

Damla Senol Cali ◽

Jeremie S Kim ◽

Saugata Ghose ◽

Can Alkan ◽

Onur Mutlu

Keyword(s):

Sequence Analysis ◽

Genome Assembly ◽

Sequence Data ◽

Error Rates ◽

Nanopore Sequencing ◽

Memory Usage ◽

Sequencing Technology ◽

Assembly Pipeline ◽

And Performance ◽

Polishing Tool

Abstract Nanopore sequencing technology has the potential to render other sequencing technologies obsolete with its ability to generate long reads and provide portability. However, high error rates of the technology pose a challenge while generating accurate genome assemblies. The tools used for nanopore sequence analysis are of critical importance, as they should overcome the high error rates of the technology. Our goal in this work is to comprehensively analyze current publicly available tools for nanopore sequence analysis to understand their advantages, disadvantages and performance bottlenecks. It is important to understand where the current tools do not perform well to develop better tools. To this end, we (1) analyze the multiple steps and the associated tools in the genome assembly pipeline using nanopore sequence data, and (2) provide guidelines for determining the appropriate tools for each step. Based on our analyses, we make four key observations: (1) the choice of the tool for basecalling plays a critical role in overcoming the high error rates of nanopore sequencing technology. (2) Read-to-read overlap finding tools, GraphMap and Minimap, perform similarly in terms of accuracy. However, Minimap has a lower memory usage, and it is faster than GraphMap. (3) There is a trade-off between accuracy and performance when deciding on the appropriate tool for the assembly step. The fast but less accurate assembler Miniasm can be used for quick initial assembly, and further polishing can be applied on top of it to increase the accuracy, which leads to faster overall assembly. (4) The state-of-the-art polishing tool, Racon, generates high-quality consensus sequences while providing a significant speedup over another polishing tool, Nanopolish. We analyze various combinations of different tools and expose the trade-offs between accuracy, performance, memory usage and scalability. We conclude that our observations can guide researchers and practitioners in making conscious and effective choices for each step of the genome assembly pipeline using nanopore sequence data. Also, with the help of bottlenecks we have found, developers can improve the current tools or build new ones that are both accurate and fast, to overcome the high error rates of the nanopore sequencing technology.

Download Full-text

Optimizing de novo genome assembly from PCR-amplified metagenomes

PeerJ ◽

10.7717/peerj.6902 ◽

2019 ◽

Vol 7 ◽

pp. e6902 ◽

Cited By ~ 9

Author(s):

Simon Roux ◽

Gareth Trubl ◽

Danielle Goudeau ◽

Nandita Nath ◽

Estelle Couradeau ◽

...

Keyword(s):

Genome Assembly ◽

De Novo ◽

Pcr Amplification ◽

Error Rates ◽

De Novo Genome Assembly ◽

Low Input ◽

Assembly Algorithm ◽

Coverage Bias ◽

Size Number ◽

Assembly Pipeline

Background Metagenomics has transformed our understanding of microbial diversity across ecosystems, with recent advances enabling de novo assembly of genomes from metagenomes. These metagenome-assembled genomes are critical to provide ecological, evolutionary, and metabolic context for all the microbes and viruses yet to be cultivated. Metagenomes can now be generated from nanogram to subnanogram amounts of DNA. However, these libraries require several rounds of PCR amplification before sequencing, and recent data suggest these typically yield smaller and more fragmented assemblies than regular metagenomes. Methods Here we evaluate de novo assembly methods of 169 PCR-amplified metagenomes, including 25 for which an unamplified counterpart is available, to optimize specific assembly approaches for PCR-amplified libraries. We first evaluated coverage bias by mapping reads from PCR-amplified metagenomes onto reference contigs obtained from unamplified metagenomes of the same samples. Then, we compared different assembly pipelines in terms of assembly size (number of bp in contigs ≥ 10 kb) and error rates to evaluate which are the best suited for PCR-amplified metagenomes. Results Read mapping analyses revealed that the depth of coverage within individual genomes is significantly more uneven in PCR-amplified datasets versus unamplified metagenomes, with regions of high depth of coverage enriched in short inserts. This enrichment scales with the number of PCR cycles performed, and is presumably due to preferential amplification of short inserts. Standard assembly pipelines are confounded by this type of coverage unevenness, so we evaluated other assembly options to mitigate these issues. We found that a pipeline combining read deduplication and an assembly algorithm originally designed to recover genomes from libraries generated after whole genome amplification (single-cell SPAdes) frequently improved assembly of contigs ≥10 kb by 10 to 100-fold for low input metagenomes. Conclusions PCR-amplified metagenomes have enabled scientists to explore communities traditionally challenging to describe, including some with extremely low biomass or from which DNA is particularly difficult to extract. Here we show that a modified assembly pipeline can lead to an improved de novo genome assembly from PCR-amplified datasets, and enables a better genome recovery from low input metagenomes.

Download Full-text

A practical guide to buildde-novoassemblies for single tissues of non-model organisms: the example of a Neotropical frog

PeerJ ◽

10.7717/peerj.3702 ◽

2017 ◽

Vol 5 ◽

pp. e3702 ◽

Cited By ~ 5

Author(s):

Santiago Montero-Mendieta ◽

Manfred Grabherr ◽

Henrik Lantz ◽

Ignacio De la Riva ◽

Jennifer A. Leonard ◽

...

Keyword(s):

Defense Mechanisms ◽

De Novo ◽

Transcriptome Assembly ◽

Cost Effective ◽

Model Organisms ◽

Rna Seq ◽

Assembly Pipeline ◽

Wide Variability ◽

History Of ◽

Inexperienced User

Whole genome sequencing (WGS) is a very valuable resource to understand the evolutionary history of poorly known species. However, in organisms with large genomes, as most amphibians, WGS is still excessively challenging and transcriptome sequencing (RNA-seq) represents a cost-effective tool to explore genome-wide variability. Non-model organisms do not usually have a reference genome and the transcriptome must be assembledde-novo. We used RNA-seq to obtain the transcriptomic profile forOreobates cruralis, a poorly known South American direct-developing frog. In total, 550,871 transcripts were assembled, corresponding to 422,999 putative genes. Of those, we identified 23,500, 37,349, 38,120 and 45,885 genes present in the Pfam, EggNOG, KEGG and GO databases, respectively. Interestingly, our results suggested that genes related to immune system and defense mechanisms are abundant in the transcriptome ofO. cruralis. We also present a pipeline to assist with pre-processing, assembling, evaluating and functionally annotating ade-novotranscriptome from RNA-seq data of non-model organisms. Our pipeline guides the inexperienced user in an intuitive way through all the necessary steps to buildde-novotranscriptome assemblies using readily available software and is freely available at:https://github.com/biomendi/TRANSCRIPTOME-ASSEMBLY-PIPELINE/wiki.

Download Full-text