High accuracy DNA sequencing on a small, scalable platform via electrical detection of single base incorporations

Mapping Intimacies ◽

10.1101/604553 ◽

2019 ◽

Author(s):

Hesaam Esfandyarpour ◽

Kosar B. Parizi ◽

Meysam R. Barmi ◽

Hamid Rategh ◽

Lisen Wang ◽

...

Keyword(s):

Dna Sequencing ◽

Capital Investment ◽

Sequence Data ◽

Gc Content ◽

Low Frequency ◽

Single Base ◽

Sequencing Platform ◽

Sequencing Technologies ◽

Electronic Detection ◽

Data Output

AbstractHigh throughput DNA sequencing technologies have undergone tremendous development over the past decade. Although optical detection-based sequencing has constituted the majority of data output, it requires a large capital investment and aggregation of samples to achieve optimal cost per sample. We have developed a novel electronic detection-based platform capable of accurately detecting single base incorporations. The GenapSys technology with its electronic detection modality allows the system to be compact, accessible, and affordable. We demonstrate the performance of the system by sequencing several different microbial genomes with varying GC content. The platform is capable of generating up to 2 Gb of high-quality nucleic acid sequence in a single run. We routinely generate sequence data that exceeds 99% raw accuracy with read lengths of up to 175 bp. Average quality scores remain above Q30 (99.9% raw sequencing accuracy) beyond 150 bp, with more than 85% of total bases at or above Q30. The utility of the platform is highlighted by targeted sequencing of the human genome. We show high concordance of SNP detection on the human NA12878 HapMap cell line with data generated on the Illumina sequencing platform. In addition, we sequenced a targeted panel of cancer-associated genes in a well characterized reference standard. With multiple library preparation approaches on this sample, we were able to identify low frequency mutations at expected allele frequencies.

Download Full-text

Single-molecule DNA sequencing of widely varying GC-content using nucleotide release, capture and detection in microdroplets

Nucleic Acids Research ◽

10.1093/nar/gkaa987 ◽

2020 ◽

Vol 48 (22) ◽

pp. e132-e132

Author(s):

Tim J Puchtler ◽

Kerr Johnson ◽

Rebecca N Palmer ◽

Emma L Talbot ◽

Lindsey A Ibbotson ◽

...

Keyword(s):

Dna Sequencing ◽

Single Molecule ◽

Direct Detection ◽

Gc Content ◽

Cost Effective ◽

Epigenetic Modifications ◽

Fluorescence Signal ◽

Sequencing Platform ◽

Sequencing Technologies ◽

Lower Accuracy

Abstract Despite remarkable progress in DNA sequencing technologies there remains a trade-off between short-read platforms, having limited ability to sequence homopolymers, repeated motifs or long-range structural variation, and long-read platforms, which tend to have lower accuracy and/or throughput. Moreover, current methods do not allow direct readout of epigenetic modifications from a single read. With the aim of addressing these limitations, we have developed an optical electrowetting sequencing platform that uses step-wise nucleotide triphosphate (dNTP) release, capture and detection in microdroplets from single DNA molecules. Each microdroplet serves as a reaction vessel that identifies an individual dNTP based on a robust fluorescence signal, with the detection chemistry extended to enable detection of 5-methylcytosine. Our platform uses small reagent volumes and inexpensive equipment, paving the way to cost-effective single-molecule DNA sequencing, capable of handling widely varying GC-bias, and demonstrating direct detection of epigenetic modifications.

Download Full-text

Improved Compression of DNA Sequencing Data with Cascading Bloom Filters

International Journal of Foundations of Computer Science ◽

10.1142/s0129054118430013 ◽

2018 ◽

Vol 29 (08) ◽

pp. 1249-1255

Author(s):

Kamil Salikhov

Keyword(s):

Dna Sequencing ◽

Sequence Data ◽

Real Data ◽

Compression Algorithm ◽

Computational Experiments ◽

Bloom Filters ◽

Dna Fragments ◽

Sequencing Data ◽

Sequencing Technologies ◽

Memory Reduction

Modern DNA sequencing technologies generate prodigious volumes of sequence data consisting of short DNA fragments (reads). Storing and transferring this data is often challenging. With this motivation, several specialized compression methods have been developed. In this paper, we present an improvement of the lossless reference-free compression algorithm, suggested by Rozov et al., based on the technique of cascading Bloom filters. Through computational experiments on real data, we demonstrate that our method results in a significant associated memory reduction in practice.

Download Full-text

Genetic Biomonitoring and Biodiversity Assessment Using Portable Sequencing Technologies: Current Uses and Future Directions

Genes ◽

10.3390/genes10110858 ◽

2019 ◽

Vol 10 (11) ◽

pp. 858 ◽

Cited By ~ 18

Author(s):

Krehenwinkel ◽

Pomerantz ◽

Prost

Keyword(s):

Dna Sequencing ◽

Biodiversity Loss ◽

Taxonomic Composition ◽

Great Promise ◽

Sequencing Platform ◽

Biological Communities ◽

Sequencing Technologies ◽

Oxford Nanopore ◽

Sequencing Studies ◽

High Throughput Dna Sequencing

We live in an era of unprecedented biodiversity loss, affecting the taxonomic composition of ecosystems worldwide. The immense task of quantifying human imprints on global ecosystems has been greatly simplified by developments in high-throughput DNA sequencing technology (HTS). Approaches like DNA metabarcoding enable the study of biological communities at unparalleled detail. However, current protocols for HTS-based biodiversity exploration have several drawbacks. They are usually based on short sequences, with limited taxonomic and phylogenetic information content. Access to expensive HTS technology is often restricted in developing countries. Ecosystems of particular conservation priority are often remote and hard to access, requiring extensive time from field collection to laboratory processing of specimens. The advent of inexpensive mobile laboratory and DNA sequencing technologies show great promise to facilitate monitoring projects in biodiversity hot-spots around the world. Recent attention has been given to portable DNA sequencing studies related to infectious organisms, such as bacteria and viruses, yet relatively few studies have focused on applying these tools to Eukaryotes, such as plants and animals. Here, we outline the current state of genetic biodiversity monitoring of higher Eukaryotes using Oxford Nanopore Technology’s MinION portable sequencing platform, as well as summarize areas of recent development.

Download Full-text

A computational screen for alternative genetic codes in over 250,000 genomes

eLife ◽

10.7554/elife.71402 ◽

2021 ◽

Vol 10 ◽

Author(s):

Yekaterina Shulgina ◽

Sean R Eddy

Keyword(s):

Amino Acid ◽

Genetic Code ◽

Sequence Data ◽

Gc Content ◽

Low Frequency ◽

Computational Method ◽

Nucleotide Sequence Data ◽

Genetic Codes ◽

Evolutionary Trajectories ◽

Sense Codon

The genetic code has been proposed to be a 'frozen accident', but the discovery of alternative genetic codes over the past four decades has shown that it can evolve to some degree. Since most examples were found anecdotally, it is difficult to draw general conclusions about the evolutionary trajectories of codon reassignment and why some codons are affected more frequently. To fill in the diversity of genetic codes, we developed Codetta, a computational method to predict the amino acid decoding of each codon from nucleotide sequence data. We surveyed the genetic code usage of over 250,000 bacterial and archaeal genome sequences in GenBank and discovered five new reassignments of arginine codons (AGG, CGA, and CGG), representing the first sense codon changes in bacteria. In a clade of uncultivated Bacilli, the reassignment of AGG to become the dominant methionine codon likely evolved by a change in the amino acid charging of an arginine tRNA. The reassignments of CGA and/or CGG were found in genomes with low GC content, an evolutionary force which likely helped drive these codons to low frequency and enable their reassignment.

Download Full-text

Shotgun metagenome data of a defined mock community using Oxford Nanopore, PacBio and Illumina technologies

Scientific Data ◽

10.1038/s41597-019-0287-z ◽

2019 ◽

Vol 6 (1) ◽

Cited By ~ 8

Author(s):

Volkan Sevim ◽

Juna Lee ◽

Robert Egan ◽

Alicia Clum ◽

Hope Hundley ◽

...

Keyword(s):

Sequence Data ◽

Sequence Similarity ◽

Gc Content ◽

Bacterial Strains ◽

Mock Community ◽

Taxonomic Assignment ◽

High Sequence Similarity ◽

Metagenomic Sequence ◽

Sequencing Platform ◽

Oxford Nanopore

AbstractMetagenomic sequence data from defined mock communities is crucial for the assessment of sequencing platform performance and downstream analyses, including assembly, binning and taxonomic assignment. We report a comparison of shotgun metagenome sequencing and assembly metrics of a defined microbial mock community using the Oxford Nanopore Technologies (ONT) MinION, PacBio and Illumina sequencing platforms. Our synthetic microbial community BMock12 consists of 12 bacterial strains with genome sizes spanning 3.2–7.2 Mbp, 40–73% GC content, and 1.5–7.3% repeats. Size selection of both PacBio and ONT sequencing libraries prior to sequencing was essential to yield comparable relative abundances of organisms among all sequencing technologies. While the Illumina-based metagenome assembly yielded good coverage with few misassemblies, contiguity was greatly improved by both, Illumina + ONT and Illumina + PacBio hybrid assemblies but increased misassemblies, most notably in genomes with high sequence similarity to each other. Our resulting datasets allow evaluation and benchmarking of bioinformatics software on Illumina, PacBio and ONT platforms in parallel.

Download Full-text

A computational screen for alternative genetic codes in over 250,000 genomes

10.1101/2021.06.18.448887 ◽

2021 ◽

Author(s):

Yekaterina Shulgina ◽

Sean R. Eddy

Keyword(s):

Amino Acid ◽

Genetic Code ◽

Sequence Data ◽

Gc Content ◽

Low Frequency ◽

Computational Method ◽

Nucleotide Sequence Data ◽

Genetic Codes ◽

Evolutionary Trajectories ◽

Sense Codon

The genetic code has been proposed to be a "frozen accident", but the discovery of alternative genetic codes over the past four decades has shown that it can evolve to some degree. Since most examples were found anecdotally, it is difficult to draw general conclusions about the evolutionary trajectories of codon reassignment and why some codons are affected more frequently. To fill in the diversity of genetic codes, we developed Codetta, a computational method to predict the amino acid decoding of each codon from nucleotide sequence data. We surveyed the genetic code usage of over 250,000 bacterial and archaeal genome sequences in GenBank and discovered five new reassignments of arginine codons (AGG, CGA, and CGG), representing the first sense codon changes in bacteria. In a clade of uncultivated Bacilli, the reassignment of AGG to become the dominant methionine codon likely evolved by a change in the amino acid charging of an arginine tRNA. The reassignments of CGA and/or CGG were found in genomes with low GC content, an evolutionary force which likely helped drive these codons to low frequency and enable their reassignment.

Download Full-text

Development of a Dual-Index Sequencing Strategy and Curation Pipeline for Analyzing Amplicon Sequence Data on the MiSeq Illumina Sequencing Platform

Applied and Environmental Microbiology ◽

10.1128/aem.01043-13 ◽

2013 ◽

Vol 79 (17) ◽

pp. 5112-5120 ◽

Cited By ~ 3102

Author(s):

James J. Kozich ◽

Sarah L. Westcott ◽

Nielson T. Baxter ◽

Sarah K. Highlander ◽

Patrick D. Schloss

Keyword(s):

16S Rrna ◽

16S Rrna Gene ◽

Sequence Data ◽

Error Rates ◽

Rrna Gene ◽

Variable Regions ◽

Sequencing Platform ◽

Sequencing Technologies ◽

Sequencing Strategy ◽

The 16S Rrna Gene

ABSTRACTRapid advances in sequencing technology have changed the experimental landscape of microbial ecology. In the last 10 years, the field has moved from sequencing hundreds of 16S rRNA gene fragments per study using clone libraries to the sequencing of millions of fragments per study using next-generation sequencing technologies from 454 and Illumina. As these technologies advance, it is critical to assess the strengths, weaknesses, and overall suitability of these platforms for the interrogation of microbial communities. Here, we present an improved method for sequencing variable regions within the 16S rRNA gene using Illumina's MiSeq platform, which is currently capable of producing paired 250-nucleotide reads. We evaluated three overlapping regions of the 16S rRNA gene that vary in length (i.e., V34, V4, and V45) by resequencing a mock community and natural samples from human feces, mouse feces, and soil. By titrating the concentration of 16S rRNA gene amplicons applied to the flow cell and using a quality score-based approach to correct discrepancies between reads used to construct contigs, we were able to reduce error rates by as much as two orders of magnitude. Finally, we reprocessed samples from a previous study to demonstrate that large numbers of samples could be multiplexed and sequenced in parallel with shotgun metagenomes. These analyses demonstrate that our approach can provide data that are at least as good as that generated by the 454 platform while providing considerably higher sequencing coverage for a fraction of the cost.

Download Full-text

Increased yields of duplex sequencing data by a series of quality control tools

NAR Genomics and Bioinformatics ◽

10.1093/nargab/lqab002 ◽

2021 ◽

Vol 3 (1) ◽

Author(s):

Gundula Povysil ◽

Monika Heinzl ◽

Renato Salazar ◽

Nicholas Stoler ◽

Anton Nekrutenko ◽

...

Keyword(s):

Low Frequency ◽

Variant Calling ◽

Data Loss ◽

Sequencing Data ◽

Bioinformatics Pipeline ◽

Consensus Sequences ◽

Sequencing Errors ◽

Data Output ◽

Reverse Strand ◽

Duplex Sequencing

Abstract Duplex sequencing is currently the most reliable method to identify ultra-low frequency DNA variants by grouping sequence reads derived from the same DNA molecule into families with information on the forward and reverse strand. However, only a small proportion of reads are assembled into duplex consensus sequences (DCS), and reads with potentially valuable information are discarded at different steps of the bioinformatics pipeline, especially reads without a family. We developed a bioinformatics toolset that analyses the tag and family composition with the purpose to understand data loss and implement modifications to maximize the data output for the variant calling. Specifically, our tools show that tags contain polymerase chain reaction and sequencing errors that contribute to data loss and lower DCS yields. Our tools also identified chimeras, which likely reflect barcode collisions. Finally, we also developed a tool that re-examines variant calls from raw reads and provides different summary data that categorizes the confidence level of a variant call by a tier-based system. With this tool, we can include reads without a family and check the reliability of the call, that increases substantially the sequencing depth for variant calling, a particular important advantage for low-input samples or low-coverage regions.

Download Full-text

Occurrence and Origin of Supernumerary Chromosomes in Partamona (Hymenoptera: Apidae: Meliponini)

Cytogenetic and Genome Research ◽

10.1159/000452290 ◽

2016 ◽

Vol 150 (1) ◽

pp. 68-75 ◽

Cited By ~ 1

Author(s):

Diana P. Machado ◽

Elder A. Miranda ◽

Mariana C. Dessi ◽

Camila P. Sabadini ◽

Marco A. Del Lama

Keyword(s):

High Frequency ◽

Scar Marker ◽

Sequence Data ◽

Low Frequency ◽

B Chromosomes ◽

First Report ◽

Supernumerary Chromosomes

Samples from 861 colonies of 12 Partamona species from 125 Brazilian localities were analysed for a SCAR marker specific to the B chromosomes of P. helleri. We identified the SCAR marker in 6 of the 12 species analysed, including 2 (P. gregaria and P. chapadicola) from the pearsoni clade. This is the first report on the presence of this marker in Partamona species that are not included in the cupira clade, which indicates that the B chromosomes probably are more widespread in this genus than previously thought. The analysis revealed a high frequency of the SCAR marker in the samples of P. helleri (0.47), P. cupira (0.46), and P. rustica (0.29), and a low frequency in P. aff. helleri (0.06). The frequency of the marker in P. helleri was correlated with the latitude of the sampling locality, decreasing from north to south. Sequence data on the SCAR marker from 50 individuals of the 6 species in which the presence of this marker was shown revealed a new scenario for the origin of the B chromosomes in Partamona.

Download Full-text

Sequencing and Computational Approaches to Identification and Characterization of Microbial Organisms

Biomedical Engineering and Computational Biology ◽

10.4137/becb.s10886 ◽

2013 ◽

Vol 5 ◽

pp. BECB.S10886 ◽

Cited By ~ 2

Author(s):

Brijesh Singh Yadav ◽

Venkateswarlu Ronda ◽

Dinesh P. Vashista ◽

Bhaskar Sharma

Keyword(s):

Sequence Data ◽

Microbial Interactions ◽

Microbial Pathogens ◽

Nucleotide Sequence Data ◽

Computational Approaches ◽

Microbial Detection ◽

Sequencing Technologies ◽

Sequencing Platforms ◽

Identification And Characterization

The recent advances in sequencing technologies and computational approaches are propelling scientists ever closer towards complete understanding of human-microbial interactions. The powerful sequencing platforms are rapidly producing huge amounts of nucleotide sequence data which are compiled into huge databases. This sequence data can be retrieved, assembled, and analyzed for identification of microbial pathogens and diagnosis of diseases. In this article, we present a commentary on how the metagenomics incorporated with microarray and new sequencing techniques are helping microbial detection and characterization.

Download Full-text