Nanocall: an open source basecaller for Oxford Nanopore sequencing data

Matei David; L. J. Dursi; Delia Yao; Paul C. Boutros; Jared T. Simpson

doi:10.1093/bioinformatics/btw569

Nanocall: An Open Source Basecaller for Oxford Nanopore Sequencing Data

10.1101/046086 ◽

2016 ◽

Cited By ~ 10

Author(s):

Matei David ◽

L.J. Dursi ◽

Delia Yao ◽

Paul C. Boutros ◽

Jared T. Simpson

Keyword(s):

Open Source ◽

Nanopore Sequencing ◽

Sequencing Data ◽

Computing Platform ◽

Internet Connection ◽

Oxford Nanopore ◽

Efficient Processing ◽

Cloud Computing Platform ◽

New Applications ◽

Single Dna Molecule

ABSTRACTMotivationThe highly portable Oxford Nanopore MinlON sequencer has enabled new applications of genome sequencing directly in the field. However, the MinlON currently relies on a cloud computing platform, Metrichor (metrichor.com), for translating locally generated sequencing data into basecalls.ResultsTo allow offline and private analysis of MinlON data, we created Nanocall. Nanocall is the first freely-available, open-source basecaller for Oxford Nanopore sequencing data and does not require an internet connection. On two E.coli and two human samples, with natural as well as PCR-amplified DNA, Nanocall reads have ~68% identity, directly comparable to Metrichor ”1D” data. Further, Nanocall is efficient, processing ~500Kbp of sequence per core hour, and fully parallelized. Using 8 cores, Nanocall could basecall a MinlON sequencing run in real time. Metrichor provides the ability to integrate the ”1D” sequencing of template and complement strands of a single DNA molecule, and create a ”2D” read. Nanocall does not currently integrate this technology, and addition of this capability will be an important future development. In summary, Nanocall is the first open-source, freely available, off-line basecaller for Oxford Nanopore sequencing data.AvailabilityNanocall is available at github.com/mateidavid/nanocall, released under the MIT license.Contactmatei.david at oicr.on.ca

Download Full-text

SACall: a neural network basecaller for Oxford Nanopore sequencing data based on self-attention mechanism

IEEE/ACM Transactions on Computational Biology and Bioinformatics ◽

10.1109/tcbb.2020.3039244 ◽

2020 ◽

pp. 1-1

Author(s):

Neng Huang ◽

Fan Nie ◽

Peng Ni ◽

Feng Luo ◽

Jianxin Wang

Keyword(s):

Neural Network ◽

Attention Mechanism ◽

Nanopore Sequencing ◽

Sequencing Data ◽

Oxford Nanopore

Download Full-text

Evaluation of Germline Structural Variant Calling Methods for Nanopore Sequencing Data

Frontiers in Genetics ◽

10.3389/fgene.2021.761791 ◽

2021 ◽

Vol 12 ◽

Author(s):

Davide Bolognini ◽

Alberto Magi

Keyword(s):

Variant Calling ◽

Research Report ◽

Nanopore Sequencing ◽

Sequencing Data ◽

Factors Affecting ◽

Sequencing Technologies ◽

Long Reads ◽

Oxford Nanopore ◽

Sequencing Studies ◽

Long Read

Structural variants (SVs) are genomic rearrangements that involve at least 50 nucleotides and are known to have a serious impact on human health. While prior short-read sequencing technologies have often proved inadequate for a comprehensive assessment of structural variation, more recent long reads from Oxford Nanopore Technologies have already been proven invaluable for the discovery of large SVs and hold the potential to facilitate the resolution of the full SV spectrum. With many long-read sequencing studies to follow, it is crucial to assess factors affecting current SV calling pipelines for nanopore sequencing data. In this brief research report, we evaluate and compare the performances of five long-read SV callers across four long-read aligners using both real and synthetic nanopore datasets. In particular, we focus on the effects of read alignment, sequencing coverage, and variant allele depth on the detection and genotyping of SVs of different types and size ranges and provide insights into precision and recall of SV callsets generated by integrating the various long-read aligners and SV callers. The computational pipeline we propose is publicly available at https://github.com/davidebolo1993/EViNCe and can be adjusted to further evaluate future nanopore sequencing datasets.

Download Full-text

Detection of Clinically Relevant Molecular Alterations in Chronic Lymphocytic Leukemia (CLL) By Nanopore Sequencing

Blood ◽

10.1182/blood-2018-99-110948 ◽

2018 ◽

Vol 132 (Supplement 1) ◽

pp. 1847-1847 ◽

Cited By ~ 1

Author(s):

Adam Burns ◽

David Robert Bruce ◽

Pauline Robbe ◽

Adele Timbs ◽

Basile Stamatopoulos ◽

...

Keyword(s):

Error Correction ◽

Low Cost ◽

Nanopore Sequencing ◽

Sequencing Data ◽

Mutation Status ◽

Short Read ◽

Short Read Sequencing ◽

Oxford Nanopore ◽

Low Coverage ◽

Oxford Nanopore Technologies

Abstract Introduction Chronic Lymphocytic Leukaemia (CLL) is the most prevalent leukaemia in the Western world and characterised by clinical heterogeneity. IgHV mutation status, mutations in the TP53 gene and deletions of the p-arm of chromosome 17 are currently used to predict an individual patient's response to therapy and give an indication as to their long-term prognosis. Current clinical guidelines recommend screening patients prior to initial, and any subsequent, treatment. Routine clinical laboratory practices for CLL involve three separate assays, each of which are time-consuming and require significant investment in equipment. Nanopore sequencing offers a rapid, low-cost alternative, generating a full prognostic dataset on a single platform. In addition, Nanopore sequencing also promises low failure rates on degraded material such as FFPE and excellent detection of structural variants due to long read length of sequencing. Importantly, Nanopore technology does not require expensive equipment, is low-maintenance and ideal for patient-near testing, making it an attractive DNA sequencing device for low-to-middle-income countries. Methods Eleven untreated CLL samples were selected for the analysis, harbouring both mutated (n=5) and unmutated (n=6) IgHV genes, seven TP53 mutations (five missense, one stop gain and one frameshift) and two del(17p) events. Primers were designed to amplify all exons of TP53, along with the IgHV locus, and each primer included universal tails for individual sample barcoding. The resulting PCR amplicons were prepared for sequencing using a ligation sequencing kit (SQK-LSK108, Oxford Nanopore Technologies, Oxford, UK). All IgHV libraries were pooled and sequenced on one R9.4 flowcell, with the TP53 libraries pooled and sequenced on a second R9.4 flowcell. Whole genome libraries were prepared from 400ng genomic DNA for each sample using a rapid sequencing kit (SQK-RAD004, Oxford Nanopore Technologies, Oxford, UK), and each sample sequenced on individual flowcells on a MinION mk1b instrument (Oxford Nanopore Technologies, Oxford, UK). We developed a bespoke bioinformatics pipeline to detect copy-number changes, TP53 mutations and IgHV mutation status from the Nanopore sequencing data. Results were compared to short-read sequencing data obtained earlier by targeted deep sequencing (MiSeq, Illumina Inc, San Diego, CA, USA) and whole genome sequencing (HiSeq 2500, Illumina Inc, San Diego CA, USA). Results Following basecalling and adaptor trimming, the raw data were submitted to the IMGT database. In the absence of error correction, it was possible to identify the correct VH family for each sample; however the germline homology was not sufficient to differentiate between IgHVmut and IgHVunmut CLL cases. Following bio-informatic error correction and consensus building, the percentage to germline homology was the same as that obtained from short-read sequencing and nanopore sequencing also called the same productive rearrangements in all cases. A total of 77 TP53 variants were identified, including 68 in non-coding regions, and three synonymous SNVs. The remaining 6 were predicted to be functional variants (eight missense and two stop-gains) and had all been identified in early MiSeq targeted sequencing. However, the frameshift mutation was not called by the analysis pipeline, although it is present in the aligned reads. Using the low-coverage WGS data, we were able to identify del(17p) events, of 19Mb and 20Mb length, in both patients with high confidence. Conclusions Here we demonstrate that characterization of the IgHV locus in CLL cases is possible using the MinION platform, provided sufficient downstream analysis, including error correction, is applied. Furthermore, somatic SNVs in TP53 can be identified, although similar to second generation sequencing, variant calling of small insertions and deletions is more problematic. Identification of del(17p) is possible from low-coverage WGS on the MinION and is inexpensive. Our data demonstrates that Nanopore sequencing can be a viable, patient-near, low-cost alternative to established screening methods, with the potential of diagnostic implementation in resource-poor regions of the world. Disclosures Schuh: Giles, Roche, Janssen, AbbVie: Honoraria.

Download Full-text

An attention-based neural network basecaller for Oxford Nanopore sequencing data

2019 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) ◽

10.1109/bibm47256.2019.8983231 ◽

2019 ◽

Author(s):

Neng Huang ◽

Fan Nie ◽

Peng Ni ◽

Feng Luo ◽

Jianxin Wang

Keyword(s):

Neural Network ◽

Nanopore Sequencing ◽

Sequencing Data ◽

Oxford Nanopore

Download Full-text

A comparative analysis of computational tools for the prediction of epigenetic DNA methylation from long-read sequencing data

10.1101/2021.04.24.441281 ◽

2021 ◽

Author(s):

Shruta Sandesh Pai ◽

Aimee Rachel Mathew ◽

Roy Anindya

Keyword(s):

Dna Methylation ◽

Dna Modification ◽

Nanopore Sequencing ◽

Sequencing Data ◽

Computational Tools ◽

Specific Location ◽

Oxford Nanopore ◽

Long Read ◽

Dna Methylations ◽

Higher Eukaryotes

AbstractRecent development of Oxford Nanopore long-read sequencing has opened new avenues of identifying epigenetic DNA methylation. Among the different epigenetic DNA methylations, N6-methyladenosine is the most prevalent DNA modification in prokaryotes and 5-methylcytosine is common in higher eukaryotes. Here we investigated if N6-methyladenosine and 5-methylcytosine modifications could be predicted from the nanopore sequencing data. Using publicly available genome sequencing data of Saccharomyces cerevisiae, we compared the open-access computational tools, including Tombo, mCaller, Nanopolish and DeepSignal for predicting 6mA and 5mC. Our results suggest that Tombo and mCaller can predict DNA N6-methyladenosine modifications at a specific location, whereas, Tombo dampened fraction, Nanopolish methylation likelihood and DeepSignal methylation probability have comparable efficiency for 5-methylcytosine prediction from Oxford Nanopore sequencing data.

Download Full-text

DNAscent v2: Detecting Replication Forks in Nanopore Sequencing Data with Deep Learning

10.1101/2020.11.04.368225 ◽

2020 ◽

Author(s):

Michael A. Boemo

Keyword(s):

Single Molecule ◽

Cell Populations ◽

Nanopore Sequencing ◽

Replication Forks ◽

Sequencing Data ◽

Single Base ◽

Oxford Nanopore ◽

Experimental Protocols ◽

Single Base Resolution ◽

Oxford Nanopore Technologies

AbstractThe detection of base analogues in Oxford Nanopore Technologies (ONT) sequencing reads has become a promising new method for the high-throughput measurement of DNA replication dynamics with single-molecule resolution. This paper introduces DNAscent v2, software that uses a residual neural network to achieve fast, accurate detection of the thymidine analogue BrdU with single-base resolution. DNAscent v2 comes equipped with an autoencoder that detects replication forks, origins, and termination sites in ONT sequencing reads from both synchronous and asynchronous cell populations, outcompeting previous versions and other tools across different experimental protocols. DNAscent v2 is open-source and available at https://github.com/MBoemo/DNAscent.

Download Full-text

CRISPR-Cas off-target detection using Oxford Nanopore sequencing - is the mitochondrial genome more vulnerable to off-targets?

10.1101/741322 ◽

2019 ◽

Cited By ~ 1

Author(s):

Sandeep Chakraborty

Keyword(s):

Mitochondrial Genome ◽

Human Subjects ◽

Pcr Amplification ◽

Distal Region ◽

Error Rates ◽

Fast Method ◽

Nanopore Sequencing ◽

Sequencing Data ◽

Oxford Nanopore ◽

Bona Fide

AbstractOxford Nanopore sequencing of DNA molecules is fast gaining popularity for generating longer reads, albeit with higher error rates, in much lesser time, and without the error introduced by PCR-amplification. Recently, CRISPR-Cas9 has been used to enrich genomic regions (nCATS [1]). This was applied on 10 genomic loci (median length=18kb). Here, using the sequencing data (Accid:PRJNA531320), it is shown that the same flow can be used to identify CRISPR-Cas9 off-target edits (OTE). OTEs are an important, but unfortunately underestimated, aspect of CRISPR-Cas gene-editing. An OTE in the mitochondrial genome is shown having 7 mismatches with one of the 10 gRNAs used (GPX1), having as much enrichment as the targeted genomic loci in some samples. Previous study has shown that Cas9 bind to off-targets having as many as 10 mismatches in the PAM-distal region. This OTE has not been reported in the original study (still a pre-print), which states that sequences from parts other than the target locations arise ‘from ligation of nanopore adaptors to random breakage points, with no clear evidence of off-target cleavage by Cas9’ [1], Furthermore, a lot of reads aligning to the mitochondrial genome (sometimes full length) are inverted after the edit. It remains to be seen if these are bona fide translocations after the Cas9 edit, or ONP sequencing artifacts. This also raises the question whether the mitochondrial genome is more prone to off-targets by virtue of being non-nuclear. Another locus in ChrX (13121412) has only 1 mismatch with the second BRAF gRNA (GACCAAGGATTTCGTGGTGA). Although the number of reads for this OTE is less, its very unlikely this is random since it happens 8 out of 11 samples. With the increasing use of (TALEN/ZFN/CRISPR-Cas9) on human subjects, this provides a fast method to quickly query gRNAs for off-targets in cells obtained from the patient, which will have their own unique off-targets due to single nucleotide polymorphism or other variants.

Download Full-text

poRe: an R package for the visualization and analysis of nanopore sequencing data

10.1101/007567 ◽

2014 ◽

Author(s):

Mick Watson ◽

Marian Thomson ◽

Judith Risse ◽

Javier Santoyo-Lopez ◽

Richard Talbot ◽

...

Keyword(s):

R Package ◽

Nanopore Sequencing ◽

Sequencing Data ◽

Sequencing Technology ◽

Statistical Software ◽

Oxford Nanopore ◽

Potential Applications ◽

R Packages ◽

Bioinformatics Community

Motivation: The Oxford Nanopore MinION device represents a unique sequencing technology. As a mobile sequencing device powered by the USB port of a laptop, the MinION has huge potential applications. To enable these applications, the bioinformatics community will need to design and build a suite of tools specifically for MinION data. Results: Here we present poRe, a package for the statistical software R that enables users to manipulate, organize, summarise and visualize MinION nanopore sequencing data. As a packge for R, poRe has been tested on both Windows and Linux. Crucially, the Windows version allows users to analyse MinION data on the Windows laptop attached to the device Availability: Pre-built R packages for Windows and Linux are available under a BSD license at http://sourceforge.net/projects/rpore/ Contact: [email protected]

Download Full-text

Nm-Nano: Predicting 2′-O-methylation (Nm) Sites in Nanopore RNA Sequencing Data

10.1101/2022.01.03.473214 ◽

2022 ◽

Author(s):

Doaa Hassan Salem ◽

Aditya Ariyur ◽

Swapna Vidhur Daulatabad ◽

Quoseena Mir ◽

Sarath Chandra Janga

Keyword(s):

Hela Cell ◽

Cell Line ◽

Rna Sequencing ◽

Hek293 Cell ◽

Nanopore Sequencing ◽

Hela Cell Line ◽

Sequencing Data ◽

Rna Sequences ◽

Rna Sequence ◽

Oxford Nanopore

Nm (2′-O-methylation) is one of the most abundant modifications of mRNAs and non-coding RNAs occurring when a methyl group (–CH3) is added to the 2′ hydroxyl (–OH) of the ribose moiety. This modification can appear on any nucleotide (base) regardless of the type of nitrogenous base, because each ribose sugar has a hydroxyl group and so 2′-O-methyl ribose can occur on any base. Nm modification has a great contribution in many biological processes such as the normal functioning of tRNA, the protection of mRNA against degradation by DXO, and the biogenesis and specificity of rRNA. Recently, the single-molecule sequencing techniques for long reads of RNA sequences data offered by Oxford Nanopore technologies have enabled the direct detection of RNA modifications on the molecule that is being sequenced, but to our knowledge there was only one research attempt that applied this technology to predict the stoichiometry of Nm-modified sites in RNA sequence of yeast cells. To this end, in this paper, we extend this research direction by proposing a bio-computational framework, Nm-Nano for predicting Nm sites in Nanopore direct RNA sequencing reads of human cell lines. Nm-Nano framework integrates two supervised machine learning models for predicting Nm sites in Nanopore sequencing data, namely Xgboost and Random Forest (RF). Each model is trained with set of features that are extracted from the raw signal generated by the Oxford Nanopore MinION device, as well as the corresponding basecalled k-mer resulting from inferring the RNA sequence reads from the generated Nanopore signals. The results on two benchmark data sets generated from RNA Nanopore sequencing data of Hela and Hek293 cell lines show a great performance of Nm-Nano. In independent validation testing, Nm-Nano has been able to identify Nm sites with a high accuracy of 93% and 88% using Xgboost and RF models respectively by training each model with Hela benchmark dataset and testing it for identifying Nm sites on Hek293 benchmark dataset. Thus, Nm-Nano outperforms the Nm sites predictors existing in the literature (not relying on Nanopore technology) that were only limited to predict Nm sites on short reads of RNA sequences and unable to predict Nm sites on long RNA sequence reads. By deploying Nm-Nano to predict Nm sites in Hela cell line, it was revealed that a total of 196 genes was identified to have the most abundance of Nm modification among all other genes that have been modified by Nm in this cell line. Similarly, deploying Nm-Nano to predict Nm sites in Hek393 cell line revealed that a total of 196 genes line was identified to have the most abundance of Nm modification among all other genes that have been modified by Nm in this cell line. According to this, a significant enrichment of a wide range of functional processes like high confidences (adjusted p-val < 0.05) enriched ontologies that were more representative of Nm modification role in immune response and cellular homeostasis were revealed in Hela cell line, and "MHC class 1 protein complex", "mitotic spindle assembly", "response to glucocorticoid", and "nucleocytoplasmic transport" were revealed in Hek293 cell line. The source code of Nm-Nano can be freely accessed https://github.com/Janga-Lab/Nm-Nano.

Download Full-text