Clustering of Circular Consensus Sequences: Accurate Error Correction and Assembly of Single Molecule Real-Time Reads from Multiplexed Amplicon Libraries

Mapping Intimacies ◽

10.1101/236893 ◽

2017 ◽

Author(s):

Felix Francis ◽

Michael D. Dumas ◽

Scott B. Davis ◽

Randall J. Wisser

Keyword(s):

Error Correction ◽

Real Time ◽

Single Molecule ◽

High Throughput Sequencing ◽

Sequence Data ◽

Sequencing Error ◽

Bootstrap Analysis ◽

Consensus Sequences ◽

Pacific Biosciences ◽

Local Sequence

BACKGROUNDTargeted resequencing with high-throughput sequencing (HTS) platforms can be used to efficiently interrogate the genomes of large numbers of individuals. A critical challenge for research and applications using HTS data, especially from long-read platforms, is errors arising from technological limits and bioinformatic algorithms.RESULTSA single molecule real-time (SMRT) sequencing-error correction and assembly pipeline, C3S-LAA, was developed for libraries of pooled amplicons. By uniquely leveraging the structure of SMRT sequence data (comprised of multiple low quality subreads from which higher quality circular consensus sequences are formed) to cluster raw reads, C3S-LAA produced accurate consensus sequences and assemblies of overlapping amplicons from single sample and multiplexed libraries. In contrast, despite read depths in excess of 100X per amplicon, the standard long amplicon analysis module from Pacific Biosciences generated unexpected numbers of amplicon sequences with substantial inaccuracies in the consensus sequences. A bootstrap analysis showed that the C3S-LAA pipeline per se was effective at removing bioinformatic sources of error, but in rare cases a read depth of nearly 400X was not sufficient to overcome minor but systematic errors inherent to amplification or sequencing.CONCLUSIONSC3S-LAA uses a novel processing algorithm for SMRT amplicon-sequence data that produces accurate consensus sequences and local sequence assemblies. The community standard long amplicon analysis module from Pacific Biosciences is prone to substantial errors that raise concerns about findings based on this pipeline. The method developed here removed this confounding bioinformatics source of error, allowing for the identification of limited instances of errors due to DNA amplification or sequencing.

Download Full-text

Clustering of circular consensus sequences: accurate error correction and assembly of single molecule real-time reads from multiplexed amplicon libraries

BMC Bioinformatics ◽

10.1186/s12859-018-2293-0 ◽

2018 ◽

Vol 19 (1) ◽

Cited By ~ 5

Author(s):

Felix Francis ◽

Michael D. Dumas ◽

Scott B. Davis ◽

Randall J. Wisser

Keyword(s):

Error Correction ◽

Real Time ◽

Single Molecule ◽

Consensus Sequences

Download Full-text

Single Molecule Real Time (SMRT™) Sequencing Sensitively Detects Polyclonal and Compound BCR-ABL in Patients Who Relapse on Kinase Inhibitor Therapy,

Blood ◽

10.1182/blood.v118.21.3752.3752 ◽

2011 ◽

Vol 118 (21) ◽

pp. 3752-3752 ◽

Cited By ~ 1

Author(s):

Catherine C. Smith ◽

Michael Brown ◽

Jason Chin ◽

Corynn Kasap ◽

Sara Salerno ◽

...

Keyword(s):

Real Time ◽

Single Molecule ◽

Kinase Inhibitor ◽

Drug Resistant ◽

Smrt Sequencing ◽

Sequencing Technology ◽

Employment Equity ◽

Pacific Biosciences ◽

Equity Ownership ◽

Tki Treatment

Abstract Abstract 3752 Background: Secondary kinase domain (KD) mutations are the most well-recognized mechanism of resistance to tyrosine kinase inhibitors (TKIs) in chronic myeloid leukemia (CML) and other cancers. In some cases, multiple drug resistant KD mutations can coexist in an individual patient (“polyclonality”). Alternatively, more than one mutation can occur in tandem on a single allele (“compound mutations”) following response and relapse to sequentially administered TKI therapy. Distinguishing between these two scenarios can inform the clinical choice of subsequent TKI treatment. There is currently no clinically adaptable methodology that offers the ability to distinguish polyclonal from compound mutations. Due to the size of the BCR-ABL KD where TKI-resistant mutations are detected, next-generation platforms are unable to generate reads of sufficient length to determine if two mutations separated by 500 nt reside on the same allele. Pacific Biosciences RS Single Molecule Real Time (SMRT) circular consensus sequencing technology is a novel third generation deep sequencing technology capable of rapidly and reliably achieving average read lengths of ∼1000bp (Travers et al, 2010) and frequently beyond 3000bp, allowing sequencing of the entire ABL KD on single strand of DNA. We sought to address the ability of SMRT sequencing technology to distinguish polyclonal from compound mutations using clinical samples obtained from patients who have relapsed on BCR-ABL TKI treatment. Results: We analyzed an 863bp area of the BCR-ABL KD in 6 patients who had clinically relapsed on ABL kinase inhibitor therapy. SMRT sequencing detected mutations at a sensitivity of ∼1–2% of the total sequenced population, and successfully distinguished polyclonal from compound BCR-ABL KD mutations in several patient samples. Results were largely consistent with those obtained by PCR subcloning and sequencing, although SMRT sequencing detected additional mutations and/or mutation combinations. In the most complex case, 7 distinct mutation-bearing alleles were detected in an individual patient after sequential relapse on imatinib and dasatinib. Mutant clones contained single and compound mutations combining distinct mutations (Y253H, T315F, T315A, T315I, T319A, E355G). Three distinct substitutions at residue T315 were detected: T315A, T315I and T315F. Notably, these findings are clinically important as the T315A mutation confers resistance to dasatinib but not imatinib, while the T315F and T315I mutations are resistant to all three clinically approved BCR/ABL inhibitors (imatinib, dasatinib, and nilotinib). Phospho-flow analysis for p-Crkl, a direct substrate of BCR-ABL, was conducted following ex vivo exposure of patient cells from the same time point to all three BCR-ABL inhibitors, and demonstrated the existence of distinct populations of cells with varying sensitivity to each drug (i.e. polyclonal drug sensitivity), underscoring the potential clinical importance of distinguishing polyclonal from compound mutations. Additionally, SMRT sequencing routinely detected alleles harboring compound mutations not detectable by conventional direct sequencing. Data analysis of samples from additional patients is ongoing and will be presented. Conclusions: Pacific Biosciences RS SMRT sequencing sensitively detects KD mutations in patient samples and can distinguish TKI-resistant clones containing compound mutations to reveal a complex mutational landscape in an individual patient not detectable by conventional sequencing. SMRT sequencing of the BCR-ABL KD can feasibly be developed into a rapid and economical clinical test with the additional advantages of increased sensitivity and reliability over current methods. Given the growing numbers of patients exposed to multiple TKIs in a sequential manner, the ability to accurately and sensitively characterize drug-resistant alleles promises to further facilitate a personalized approach to patient management. Disclosures: Brown: Pacific Biosciences: Employment. Chin:Pacific Biosciences: Employment. Travers:Pacific Biosciences: Employment. Wang:Pacific Biosciences: Employment. Kasarskis:Pacific Biosciences: Employment, Equity Ownership. Schadt:Pacific Biosciences: Employment, Equity Ownership.

Download Full-text

Microsatellite marker discovery using single molecule real-time circular consensus sequencing on the Pacific Biosciences RS

BioTechniques ◽

10.2144/000114104 ◽

2013 ◽

Vol 55 (5) ◽

Cited By ~ 15

Author(s):

Markus A. Grohme ◽

Roberto Frias Soler ◽

Michael Wink ◽

Marcus Frohme

Keyword(s):

Real Time ◽

Microsatellite Marker ◽

Single Molecule ◽

Pacific Biosciences ◽

The Pacific ◽

Marker Discovery ◽

Circular Consensus Sequencing

Download Full-text

Enabling metagenomic surveillance for bacterial tick-borne pathogens using nanopore sequencing with adaptive sampling

10.1101/2021.08.17.456696 ◽

2021 ◽

Author(s):

Evan J. Kipp ◽

Laramie L. Lindsey ◽

Benedict S. Khoo ◽

Christopher Faulk ◽

Jonathan D. Oliver ◽

...

Keyword(s):

Real Time ◽

Single Molecule ◽

Adaptive Sampling ◽

Sequence Data ◽

Ixodes Scapularis ◽

Fold Increase ◽

Borrelia Miyamotoi ◽

Nucleotide Sequence Data ◽

Oxford Nanopore ◽

Sequencing Platforms

Technological and computational advancements in the fields of genomics and bioinformatics are providing exciting new opportunities for pathogen discovery and surveillance. In particular, single-molecule nucleotide sequence data originating from Oxford Nanopore Technologies (ONT) sequencing platforms can be bioinformatically leveraged, in real-time, for enhanced biosurveillance of a vast array of zoonoses. The recently released nanopore adaptive sampling (NAS) pipeline facilitates immediate mapping of individual nucleotide molecules (i.e., DNA, cDNA, and RNA) to a given reference as each molecule is sequenced. User-defined thresholds then allow for the retention or rejection of specific molecules, informed by the real-time reference mapping results, as they are physically passing through a given sequencing nanopore. Here, we show how NAS can be used to selectively sequence entire genomes of bacterial tick-borne pathogens circulating in wild populations of the blacklegged tick vector, Ixodes scapularis. The NAS method provided a two-fold increase in targeted pathogen sequences, successfully enriching for Borrelia (Borreliella) burgdorferi s.s.; Borrelia (Borrelia) miyamotoi; Anaplasma phagocytophilum; and Ehrlichia muris eauclairensis genomic DNA within our I. scapularis samples. Our results indicate that NAS has strong potential for real-time sequence-based pathogen surveillance.

Download Full-text

Single molecule, near full-length genome sequencing of dengue virus

Scientific Reports ◽

10.1038/s41598-020-75374-1 ◽

2020 ◽

Vol 10 (1) ◽

Author(s):

Thiruni N. Adikari ◽

Nasir Riaz ◽

Chathurani Sigera ◽

Preston Leung ◽

Braulio M. Valencia ◽

...

Keyword(s):

Dengue Virus ◽

Single Molecule ◽

Phylogenetic Trees ◽

Sequence Data ◽

Sequence Similarity ◽

Genetic Distances ◽

Full Length ◽

Denv Serotypes ◽

Consensus Sequences ◽

Pairwise Sequence Similarity

Abstract Current methods for dengue virus (DENV) genome amplification, amplify parts of the genome in at least 5 overlapping segments and then combine the output to characterize a full genome. This process is laborious, costly and requires at least 10 primers per serotype, thus increasing the likelihood of PCR bias. We introduce an assay to amplify near full-length dengue virus genomes as intact molecules, sequence these amplicons with third generation “nanopore” technology without fragmenting and use the sequence data to differentiate within-host viral variants with a bioinformatics tool (Nano-Q). The new assay successfully generated near full-length amplicons from DENV serotypes 1, 2 and 3 samples which were sequenced with nanopore technology. Consensus DENV sequences generated by nanopore sequencing had over 99.5% pairwise sequence similarity to Illumina generated counterparts provided the coverage was > 100 with both platforms. Maximum likelihood phylogenetic trees generated from nanopore consensus sequences were able to reproduce the exact trees made from Illumina sequencing with a conservative 99% bootstrapping threshold (after 1000 replicates and 10% burn-in). Pairwise genetic distances of within host variants identified from the Nano-Q tool were less than that of between host variants, thus enabling the phylogenetic segregation of variants from the same host.

Download Full-text

Real-time single-molecule electronic DNA sequencing by synthesis using polymer-tagged nucleotides on a nanopore array

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1601782113 ◽

2016 ◽

Vol 113 (19) ◽

pp. 5233-5238 ◽

Cited By ~ 79

Author(s):

Carl W. Fuller ◽

Shiv Kumar ◽

Mintu Porel ◽

Minchen Chien ◽

Arek Bibillo ◽

...

Keyword(s):

Dna Sequencing ◽

Real Time ◽

Lipid Bilayers ◽

Dna Polymerase ◽

Single Molecule ◽

High Throughput Sequencing ◽

Electrical Current ◽

Sequencing Data ◽

Sequencing By Synthesis ◽

Dna Strand

DNA sequencing by synthesis (SBS) offers a robust platform to decipher nucleic acid sequences. Recently, we reported a single-molecule nanopore-based SBS strategy that accurately distinguishes four bases by electronically detecting and differentiating four different polymer tags attached to the 5′-phosphate of the nucleotides during their incorporation into a growing DNA strand catalyzed by DNA polymerase. Further developing this approach, we report here the use of nucleotides tagged at the terminal phosphate with oligonucleotide-based polymers to perform nanopore SBS on an α-hemolysin nanopore array platform. We designed and synthesized several polymer-tagged nucleotides using tags that produce different electrical current blockade levels and verified they are active substrates for DNA polymerase. A highly processive DNA polymerase was conjugated to the nanopore, and the conjugates were complexed with primer/template DNA and inserted into lipid bilayers over individually addressable electrodes of the nanopore chip. When an incoming complementary-tagged nucleotide forms a tight ternary complex with the primer/template and polymerase, the tag enters the pore, and the current blockade level is measured. The levels displayed by the four nucleotides tagged with four different polymers captured in the nanopore in such ternary complexes were clearly distinguishable and sequence-specific, enabling continuous sequence determination during the polymerase reaction. Thus, real-time single-molecule electronic DNA sequencing data with single-base resolution were obtained. The use of these polymer-tagged nucleotides, combined with polymerase tethering to nanopores and multiplexed nanopore sensors, should lead to new high-throughput sequencing methods.

Download Full-text

Real-time kinetics and high-resolution melt curves in single-molecule digital LAMP to differentiate and study specific and non-specific amplification

Nucleic Acids Research ◽

10.1093/nar/gkaa099 ◽

2020 ◽

Vol 48 (7) ◽

pp. e42-e42 ◽

Cited By ~ 9

Author(s):

Justin C Rolando ◽

Erik Jue ◽

Jacob T Barlow ◽

Rustem F Ismagilov

Keyword(s):

High Resolution ◽

Real Time ◽

Single Molecule ◽

High Throughput Sequencing ◽

Limit Of Detection ◽

Isothermal Amplification ◽

Rapid Diagnostics ◽

High Resolution Melt ◽

Specific Amplification ◽

The Impact

Abstract Isothermal amplification assays, such as loop-mediated isothermal amplification (LAMP), show great utility for the development of rapid diagnostics for infectious diseases because they have high sensitivity, pathogen-specificity and potential for implementation at the point of care. However, elimination of non-specific amplification remains a key challenge for the optimization of LAMP assays. Here, using chlamydia DNA as a clinically relevant target and high-throughput sequencing as an analytical tool, we investigate a potential mechanism of non-specific amplification. We then develop a real-time digital LAMP (dLAMP) with high-resolution melting temperature (HRM) analysis and use this single-molecule approach to analyze approximately 1.2 million amplification events. We show that single-molecule HRM provides insight into specific and non-specific amplification in LAMP that are difficult to deduce from bulk measurements. We use real-time dLAMP with HRM to evaluate differences between polymerase enzymes, the impact of assay parameters (e.g. time, rate or florescence intensity), and the effect background human DNA. By differentiating true and false positives, HRM enables determination of the optimal assay and analysis parameters that leads to the lowest limit of detection (LOD) in a digital isothermal amplification assay.

Download Full-text

Draft Genome Sequence of a Metabolically Diverse Antarctic Supraglacial Stream Organism, Polaromonas sp. Strain CG9_12, Determined Using Pacific Biosciences Single-Molecule Real-Time Sequencing Technology

Genome Announcements ◽

10.1128/genomea.01242-14 ◽

2014 ◽

Vol 2 (6) ◽

Author(s):

H. J. Smith ◽

C. M. Foreman ◽

T. Ramaraj

Keyword(s):

Real Time ◽

Genome Sequence ◽

Single Molecule ◽

Draft Genome ◽

Draft Genome Sequence ◽

Sequencing Technology ◽

Pacific Biosciences

Download Full-text

Snake venom gland cDNA sequencing using the Oxford Nanopore MinION portable DNA sequencer

10.1101/025148 ◽

2015 ◽

Cited By ~ 2

Author(s):

John F Mulley ◽

Adam D Hargreaves

Keyword(s):

Error Correction ◽

Single Molecule ◽

De Novo ◽

Venom Gland ◽

Sequencing Error ◽

Cdna Sequencing ◽

Venom Toxin ◽

Oxford Nanopore ◽

Illumina Data ◽

Corrected Data

Portable DNA sequencers such as the Oxford Nanopore MinION device have the potential to be truly disruptive technologies, facilitating new approaches and analyses and, in some cases, taking sequencing out of the lab and into the field. However, the capabilities of these technologies are still being revealed. Here we show that single-molecule cDNA sequencing using the MinION accurately characterises venom toxin-encoding genes in the painted saw-scaled viper, Echis coloratus. We find the raw sequencing error rate to be around 12%, improved to 0-2% with hybrid error correction and 3% with de novo error correction. Our corrected data provides full coding sequences and 5' and 3' UTRs for 29 of 33 candidate venom toxins detected, far superior to Illumina data (13/40 complete) and Sanger-based ESTs (15/29). We suggest that, should the current pace of improvement continue, the MinION will become the default approach for cDNA sequencing in a variety of species.

Download Full-text

Third Generation DNA Sequencing: Pacific Biosciences' Single Molecule Real Time Technology

Chemistry & Biology ◽

10.1016/j.chembiol.2010.07.004 ◽

2010 ◽

Vol 17 (7) ◽

pp. 675-676 ◽

Cited By ~ 98

Author(s):

Alice McCarthy

Keyword(s):

Dna Sequencing ◽

Real Time ◽

Single Molecule ◽

Third Generation ◽

Pacific Biosciences

Download Full-text