Rapid multiplex small DNA sequencing on the MinION nanopore sequencing platform

Genetic Biomonitoring and Biodiversity Assessment Using Portable Sequencing Technologies: Current Uses and Future Directions

Genes ◽

10.3390/genes10110858 ◽

2019 ◽

Vol 10 (11) ◽

pp. 858 ◽

Cited By ~ 18

Author(s):

Krehenwinkel ◽

Pomerantz ◽

Prost

Keyword(s):

Dna Sequencing ◽

Biodiversity Loss ◽

Taxonomic Composition ◽

Great Promise ◽

Sequencing Platform ◽

Biological Communities ◽

Sequencing Technologies ◽

Oxford Nanopore ◽

Sequencing Studies ◽

High Throughput Dna Sequencing

We live in an era of unprecedented biodiversity loss, affecting the taxonomic composition of ecosystems worldwide. The immense task of quantifying human imprints on global ecosystems has been greatly simplified by developments in high-throughput DNA sequencing technology (HTS). Approaches like DNA metabarcoding enable the study of biological communities at unparalleled detail. However, current protocols for HTS-based biodiversity exploration have several drawbacks. They are usually based on short sequences, with limited taxonomic and phylogenetic information content. Access to expensive HTS technology is often restricted in developing countries. Ecosystems of particular conservation priority are often remote and hard to access, requiring extensive time from field collection to laboratory processing of specimens. The advent of inexpensive mobile laboratory and DNA sequencing technologies show great promise to facilitate monitoring projects in biodiversity hot-spots around the world. Recent attention has been given to portable DNA sequencing studies related to infectious organisms, such as bacteria and viruses, yet relatively few studies have focused on applying these tools to Eukaryotes, such as plants and animals. Here, we outline the current state of genetic biodiversity monitoring of higher Eukaryotes using Oxford Nanopore Technology’s MinION portable sequencing platform, as well as summarize areas of recent development.

Download Full-text

Resolving the complex Bordetella pertussis genome using barcoded nanopore sequencing

10.1101/381640 ◽

2018 ◽

Author(s):

Natalie Ring ◽

Jonathan Abrahams ◽

Miten Jain ◽

Hugh Olsen ◽

Andrew Preston ◽

...

Keyword(s):

Bordetella Pertussis ◽

Flow Cell ◽

Nanopore Sequencing ◽

Library Preparation ◽

Genome Sequences ◽

Short Read ◽

Link Type ◽

Analysis Tools ◽

Long Read ◽

Assembly Pipeline

ABSTRACTThe genome of Bordetella pertussis is complex, with high GC content and many repeats, each longer than 1,000 bp. Short-read DNA sequencing is unable to resolve the structure of the genome; however, long-read sequencing offers the opportunity to produce single-contig B. pertussis assemblies using sequencing reads which are longer than the repetitive sections. We used an R9.4 MinION flow cell and barcoding to sequence five B. pertussis strains in a single sequencing run. We then trialled combinations of the many nanopore-user-community-built long-read analysis tools to establish the current optimal assembly pipeline for B. pertussis genome sequences. Our best long-read-only assemblies were produced by Canu read correction followed by assembly with Flye and polishing with Nanopolish, whilst the best hybrids (using nanopore and Illumina reads together) were produced by Canu correction followed by Unicycler. This pipeline produced closed genome sequences for four strains, revealing inter-strain genomic rearrangement. However, read mapping to the Tohama I reference genome suggests that the remaining strain contains an ultra-long duplicated region (over 100 kbp), which was not resolved by our pipeline. We have therefore demonstrated the ability to resolve the structure of several B. pertussis strains per single barcoded nanopore flow cell, but the genomes with highest complexity (e.g. very large duplicated regions) remain only partially resolved using the standard library preparation and will require an alternative library preparation method. For full strain characterisation, we recommend hybrid assembly of long and short reads together; for comparison of genome arrangement, assembly using long reads alone is sufficient.DATA SUMMARYFinal sequence read files (fastq) for all 5 strains have been deposited in the SRA, BioProject PRJNA478201, accession numbers SAMN09500966, SAMN09500967, SAMN09500968, SAMN09500969, SAMN09500970A full list of accession numbers for Illumina sequence reads is available in Table S1Assembly tests, basecalled read sets and reference materials are available from figshare: https://figshare.com/projects/Resolving_the_complex_Bordetella_pertussis_genome_using_barcoded_nanopore_sequencing/31313Genome sequences for B. pertussis strains UK36, UK38, UK39, UK48 and UK76 have been deposited in GenBank; accession numbers: CP031289, CP031112, CP031113, QRAX00000000, CP031114Source code and full commands used are available from Github: https://github.com/nataliering/Resolving-the-complex-Bordetella-pertussis-genome-using-barcoded-nanopore-sequencingIMPACT STATEMENTOver the past two decades, whole genome sequencing has allowed us to understand microbial pathogenicity and evolution on an unprecedented level. However, repetitive regions, like those found throughout the B. pertussis genome, have confounded our ability to resolve complex genomes using short-read sequencing technologies alone. To produce closed B. pertussis genome sequences it is necessary to use a sequencing technology which can generate reads longer than these problematic genomic regions. Using barcoded nanopore sequencing, we show that multiple B. pertussis genomes can be resolved per flow cell. Use of our assembly pipeline to resolve further B. pertussis genomes will advance understanding of how genome-level differences affect the phenotypes of strains which appear monomorphic at nucleotide-level.This work expands the recently emergent theme that even the most complex genomes can be resolved with sufficiently long sequencing reads. Additionally, we utilise a more widely accessible alternative sequencing platform to the Pacific Biosciences platform already used by large research centres such as the CDC. Our optimisation process, moreover, shows that the analysis tools favoured by the sequencing community do not necessarily produce the most accurate assemblies for all organisms; pipeline optimisation may therefore be beneficial in studies of unusually complex genomes.

Download Full-text

Nanopore sequencing as a scalable, cost-effective platform for analyzing polyclonal vector integration sites following clinical T cell therapy

Journal for ImmunoTherapy of Cancer ◽

10.1136/jitc-2019-000299 ◽

2020 ◽

Vol 8 (1) ◽

pp. e000299

Author(s):

Ping Zhang ◽

Devika Ganesamoorthy ◽

Son Hoang Nguyen ◽

Raymond Au ◽

Lachlan J Coin ◽

...

Keyword(s):

Next Generation Sequencing ◽

Genomic Dna ◽

Cost Effective ◽

Inverse Pcr ◽

Nanopore Sequencing ◽

Next Generation ◽

Short Read ◽

Vector Integration ◽

Integration Sites ◽

Generation Sequencing

BackgroundAnalysis of vector integration sites in gene-modified cells can provide critical information on clonality and potential biological impact on nearby genes. Current short-read next-generation sequencing methods require specialized instruments and large batch runs.MethodsWe used nanopore sequencing to analyze the vector integration sites of T cells transduced by the gammaretroviral vector, SFG.iCasp9.2A.ΔCD19. DNA from oligoclonal cell lines and polyclonal clinical samples were restriction enzyme digested with two 6-cutters,NcoIandBspHI; and the flanking genomic DNA amplified by inverse PCR or cassette ligation PCR. Following nested PCR and barcoding, the amplicons were sequenced on the Oxford Nanopore platform. Reads were filtered for quality, trimmed, and aligned. Custom tool was developed to cluster reads and merge overlapping clusters.ResultsBoth inverse PCR and cassette ligation PCR could successfully amplify flanking genomic DNA, with cassette ligation PCR showing less bias. The 4.8 million raw reads were grouped into 12,186 clusters and 6410 clones. The 3′long terminal repeat (LTR)-genome junction could be resolved within a 5-nucleotide span for a majority of clusters and within one nucleotide span for clusters with ≥5 reads. The chromosomal distributions of the insertional sites and their predilection for regions proximate to transcription start sites were consistent with previous reports for gammaretroviral vector integrants as analyzed by short-read next-generation sequencing.ConclusionOur study shows that it is feasible to use nanopore sequencing to map polyclonal vector integration sites. The assay is scalable and requires minimum capital, which together enable cost-effective and timely analysis. Further refinement is required to reduce amplification bias and improve single nucleotide resolution.

Download Full-text

Impact of DNA Sequencing and Analysis Methods on 16S rRNA Gene Bacterial Community Analysis of Dairy Products

mSphere ◽

10.1128/msphere.00410-18 ◽

2018 ◽

Vol 3 (5) ◽

Cited By ~ 9

Author(s):

Zhengyao Xue ◽

Mary E. Kable ◽

Maria L. Marco

Keyword(s):

16S Rrna ◽

Dna Sequencing ◽

Dairy Products ◽

Bacterial Species ◽

Ion Torrent ◽

Mock Community ◽

Sequencing Platform ◽

Ion Torrent Pgm ◽

Sequencing Method ◽

Mock Communities

ABSTRACT DNA sequencing and analysis methods were compared for 16S rRNA V4 PCR amplicon and genomic DNA (gDNA) mock communities encompassing nine bacterial species commonly found in milk and dairy products. The two communities comprised strain-specific DNA that was pooled before (gDNA) or after (PCR amplicon) the PCR step. The communities were sequenced on the Illumina MiSeq and Ion Torrent PGM platforms and then analyzed using the QIIME 1 (UCLUST) and Divisive Amplicon Denoising Algorithm 2 (DADA2) analysis pipelines with taxonomic comparisons to the Greengenes and Ribosomal Database Project (RDP) databases. Examination of the PCR amplicon mock community with these methods resulted in operational taxonomic units (OTUs) and amplicon sequence variants (ASVs) that ranged from 13 to 118 and were dependent on the DNA sequencing method and read assembly steps. The additional 4 to 109 OTUs/ASVs (from 9 OTUs/ASVs) included assignments to spurious taxa and sequence variants of the 9 species included in the mock community. Comparisons between the gDNA and PCR amplicon mock communities showed that combining gDNAs from the different strains prior to PCR resulted in up to 8.9-fold greater numbers of spurious OTUs/ASVs. However, the DNA sequencing method and paired-end read assembly steps conferred the largest effects on predictions of bacterial diversity, with effect sizes of 0.88 (Bray-Curtis) and 0.32 (weighted Unifrac), independent of the mock community type. Overall, DNA sequencing performed with the Ion Torrent PGM and analyzed with DADA2 and the Greengenes database resulted in the most accurate predictions of the mock community phylogeny, taxonomy, and diversity. IMPORTANCE Validated methods are urgently needed to improve DNA sequence-based assessments of complex bacterial communities. In this study, we used 16S rRNA PCR amplicon and gDNA mock community standards, consisting of nine, dairy-associated bacterial species, to evaluate the most commonly applied 16S rRNA marker gene DNA sequencing and analysis platforms used in evaluating dairy and other bacterial habitats. Our results show that bacterial metataxonomic assessments are largely dependent on the DNA sequencing platform and read curation method used. DADA2 improved sequence annotation compared with QIIME 1, and when combined with the Ion Torrent PGM DNA sequencing platform and the Greengenes database for taxonomic assignment, the most accurate representation of the dairy mock community standards was reached. This approach will be useful for validating sample collection and DNA extraction methods and ultimately investigating bacterial population dynamics in milk- and dairy-associated environments.

Download Full-text

Robust long-read native DNA sequencing using the ONT CsgG Nanopore system

Wellcome Open Research ◽

10.12688/wellcomeopenres.11246.1 ◽

2017 ◽

Vol 2 ◽

pp. 23 ◽

Cited By ~ 12

Author(s):

Jean-Michel Carter ◽

Shobbir Hussain

Keyword(s):

Dna Sequencing ◽

Cancer Cell Line ◽

Read Length ◽

Nanopore Sequencing ◽

Computational Tools ◽

Practical Applications ◽

Oxford Nanopore ◽

Sequencing Method ◽

Long Read ◽

Oxford Nanopore Technologies

Background: The ability to obtain long read lengths during DNA sequencing has several potentially important practical applications. Especially long read lengths have been reported using the Nanopore sequencing method, currently commercially available from Oxford Nanopore Technologies (ONT). However, early reports have demonstrated only limited levels of combined throughput and sequence accuracy. Recently, ONT released a new CsgG pore sequencing system as well as a 250b/s translocation chemistry with potential for improvements. Methods: We made use of such components on ONTs miniature ‘MinION’ device and sequenced native genomic DNA obtained from the near haploid cancer cell line HAP1. Analysis of our data was performed utilising recently described computational tools tailored for nanopore/long-read sequencing outputs, and here we present our key findings. Results: From a single sequencing run, we obtained ~240,000 high-quality mapped reads, comprising a total of ~2.3 billion bases. A mean read length of 9.6kb and an N50 of ~17kb was achieved, while sequences mapped to reference with a mean identity of 85%. Notably, we obtained ~68X coverage of the mitochondrial genome and were able to achieve a mean consensus identity of 99.8% for sequenced mtDNA reads. Conclusions: With improved sequencing chemistries already released and higher-throughput instruments in the pipeline, this early study suggests that ONT CsgG-based sequencing may be a useful option for potential practical long-read applications.

Download Full-text

Rapid Multiplex Small DNA Sequencing on the MinION Nanopore Sequencing Platform

G3 Genes|Genome|Genetics ◽

10.1534/g3.118.200087 ◽

2018 ◽

Vol 8 (5) ◽

pp. 1649-1657 ◽

Cited By ~ 14

Author(s):

Shan Wei ◽

Zachary R. Weiss ◽

Zev Williams

Keyword(s):

Dna Sequencing ◽

Nanopore Sequencing ◽

Sequencing Platform

Download Full-text

NanoAmpli-Seq: A workflow for amplicon sequencing for mixed microbial communities on the nanopore sequencing platform

10.1101/244517 ◽

2018 ◽

Cited By ~ 2

Author(s):

Szymon T Calus ◽

Umer Z Ijaz ◽

Ameet J Pinto

Keyword(s):

16S Rrna ◽

16S Rrna Gene ◽

Amplicon Sequencing ◽

Error Rates ◽

Full Length ◽

Rrna Gene ◽

Nanopore Sequencing ◽

Short Read ◽

Sequencing Platform ◽

Sequencing Platforms

AbstractBackgroundAmplicon sequencing on Illumina sequencing platforms leverages their deep sequencing and multiplexing capacity, but is limited in genetic resolution due to short read lengths. While Oxford Nanopore or Pacific Biosciences platforms overcome this limitation, their application has been limited due to higher error rates or smaller data output.ResultsIn this study, we introduce an amplicon sequencing workflow, i.e., NanoAmpli-Seq, that builds on Intramolecular-ligated Nanopore Consensus Sequencing (INC-Seq) approach and demonstrate its application for full-length 16S rRNA gene sequencing. NanoAmpli-Seq includes vital improvements to the aforementioned protocol that reduces sample-processing time while significantly improving sequence accuracy. The developed protocol includes chopSeq software for fragmentation and read orientation correction of INC-Seq consensus reads while nanoClust algorithm was designed for read partitioning-based de novo clustering and within cluster consensus calling to obtain full-length 16S rRNA gene sequences.ConclusionsNanoAmpli-Seq accurately estimates the diversity of tested mock communities with average sequence accuracy of 99.5% for 2D and 1D2 sequencing on the nanopore sequencing platform. Nearly all residual errors in NanoAmpli-Seq sequences originate from deletions in homopolymer regions, indicating that homopolymer aware basecalling or error correction may allow for sequencing accuracy comparable to short-read sequencing platforms.

Download Full-text

CUTseq is a versatile method for preparing multiplexed DNA sequencing libraries from low-input samples

10.21203/rs.2.1742/v2 ◽

2019 ◽

Author(s):

Xiaolu Zhang ◽

Silvano Garnerone ◽

Michele Simonetti ◽

Luuk Harbers ◽

Marcin Nicoś ◽

...

Keyword(s):

Dna Sequencing ◽

Genomic Dna ◽

Massively Parallel Sequencing ◽

Cost Effective ◽

Restriction Enzymes ◽

In Vitro Transcription ◽

Intratumor Heterogeneity ◽

Library Preparation ◽

Low Input

Abstract Current multiplexing strategies for massively parallel sequencing of genomic DNA mainly rely on library indexing in the final steps of library preparation. This procedure is costly and time-consuming because a single library must be produced separately for each sample. Furthermore, library preparation is challenging in the case of low-input fixed samples, such as DNA extracted from formalin-fixed paraffin-embedded (FFPE) tissues. Here, we describe CUTseq, a method that uses restriction enzymes and in vitro transcription to barcode and amplify genomic DNA prior to library construction. We thoroughly validate CUTseq and demonstrate its applicability to both genome and exome sequencing, enabling multi-region genome profiling within single stained FFPE tissue sections, to assess intratumor heterogeneity at high spatial resolution. In conclusion, CUTseq is a versatile and cost-effective method for multiplexed DNA sequencing library preparation that can find numerous applications in research and diagnostics.

Download Full-text

Robust long-read native DNA sequencing using the ONT CsgG Nanopore system

Wellcome Open Research ◽

10.12688/wellcomeopenres.11246.3 ◽

2018 ◽

Vol 2 ◽

pp. 23 ◽

Cited By ~ 5

Author(s):

Jean-Michel Carter ◽

Shobbir Hussain

Keyword(s):

Dna Sequencing ◽

Cancer Cell Line ◽

Read Length ◽

Nanopore Sequencing ◽

Computational Tools ◽

Practical Applications ◽

Oxford Nanopore ◽

Sequencing Method ◽

Long Read ◽

Oxford Nanopore Technologies

Background: The ability to obtain long read lengths during DNA sequencing has several potentially important practical applications. Especially long read lengths have been reported using the Nanopore sequencing method, currently commercially available from Oxford Nanopore Technologies (ONT). However, early reports have demonstrated only limited levels of combined throughput and sequence accuracy. Recently, ONT released a new CsgG pore sequencing system as well as a 250b/s translocation chemistry with potential for improvements. Methods: We made use of such components on ONTs miniature ‘MinION’ device and sequenced native genomic DNA obtained from the near haploid cancer cell line HAP1. Analysis of our data was performed utilising recently described computational tools tailored for nanopore/long-read sequencing outputs, and here we present our key findings. Results: From a single sequencing run, we obtained ~240,000 high-quality mapped reads, comprising a total of ~2.3 billion bases. A mean read length of 9.6kb and an N50 of ~17kb was achieved, while sequences mapped to reference with a mean identity of 85%. Notably, we obtained ~68X coverage of the mitochondrial genome and were able to achieve a mean consensus identity of 99.8% for sequenced mtDNA reads. Conclusions: With improved sequencing chemistries already released and higher-throughput instruments in the pipeline, this early study suggests that ONT CsgG-based sequencing may be a useful option for potential practical long-read applications with relevance to complex genomes.

Download Full-text

Impact of DNA sequencing and analysis methods on 16S rRNA gene bacterial community analysis in dairy products

10.1101/305078 ◽

2018 ◽

Author(s):

Zhengyao Xue ◽

Mary E Kable ◽

Maria L Marco

Keyword(s):

16S Rrna ◽

Dna Sequencing ◽

Dairy Products ◽

Bacterial Species ◽

Ion Torrent ◽

Mock Community ◽

Sequencing Platform ◽

Ion Torrent Pgm ◽

Sequencing Method ◽

Mock Communities

AbstractDNA sequencing and analysis methods were compared for 16S rRNA V4 PCR amplicon and gDNA mock communities encompassing nine bacterial species commonly found in milk and dairy products. The communities were examined using Illumina MiSeq and Ion Torrent PGM DNA sequencing methods followed by the QIIME 1 (UCLUST) and Divisive Amplicon Denoising Algorithm 2 (DADA2) data analysis pipelines including taxonomic comparisons to the Greengenes and Ribosomal Database Project (RDP) databases. Examination of the PCR amplicon mock community with these methods resulted in Operation Taxonomy Units (OTUs) and Amplicon Sequence Variants (ASVs) that ranged from a low of 13 to high of 118 and were dependent on the DNA sequencing method and read assembly step. The elevated numbers of OTUs and ASVs included assignments to spurious taxa as well as sequence variants of the nine species included in the mock community. Comparisons between the gDNA and PCR amplicon mock communities showed that combining gDNA from the different strains prior to PCR resulted in up to 8.9-fold greater numbers of spurious OTUs and ASVs. However, the DNA sequencing method and initial data assembly steps conferred the largest effects on predictions of bacterial diversity, independent of the mock community type (PCR amplicon or gDNA; Bray-Curtis R2 = 0.88 and weighted Unifrac, R2 = 0.32). Overall, DNA sequencing performed with the Ion Torrent PGM and analyzed with DADA2 and the Greengenes database resulted in the most accurate predictions of the mock community phylogeny, taxonomy, and diversity.ImportanceValidated methods are urgently needed to improve DNA-sequence based assessments of complex bacterial communities. In this study, we used 16S rRNA PCR amplicon and gDNA mock community standards, consisting of nine, dairy-associated bacterial species, to evaluate the most commonly applied 16S rRNA marker gene DNA sequencing and analysis platforms used in evaluating dairy and other bacterial habitats. Our results show that bacterial metataxonomic assessments are largely dependent on the DNA sequencing platform and read curation method used. DADA2 improved sequence annotation compared with QIIME 1, and when combined with the Ion Torrent PGM DNA sequencing platform and the Greengenes database for taxonomic assignment, the most accurate representation of the dairy mock community standards was reached. This approach will be useful for validating sample collection and DNA extraction methods and ultimately investigating bacterial population dynamics in milk and dairy-associated environments.

Download Full-text