Cytosine Variant Calling with High-throughput Nanopore Sequencing

Mapping Intimacies ◽

10.1101/047134 ◽

2016 ◽

Cited By ~ 7

Author(s):

Arthur C. Rand ◽

Miten Jain ◽

Jordan Eizenga ◽

Audrey Musselman-Brown ◽

Hugh E. Olsen ◽

...

Keyword(s):

Single Molecule ◽

Dirichlet Process ◽

Genomic Dna ◽

Probabilistic Method ◽

Simultaneous Detection ◽

Ionic Current ◽

Variant Calling ◽

Variable Order ◽

Chemical Modifications ◽

Oxford Nanopore

AbstractChemical modifications to DNA regulate cellular state and function. The Oxford Nanopore MinION is a portable single-molecule DNA sequencer that can sequence long fragments of genomic DNA. Here we show that the MinION can be used to detect and map two chemical modifications cytosine, 5-methylcytosine and 5-hydroxymethylcytosine. We present a probabilistic method that enables expansion of the nucleotide alphabet to include bases containing chemical modifications. Our results on synthetic DNA show that individual cytosine base modifications can be classified with accuracy up to 95% in a three-way comparison and 98% in a two-way comparison.Statement of SignificanceNanopore-based sequencing technology can produce long reads from unamplified genomic DNA, potentially allowing the characterization of chemical modifications and non-canonical DNA nucleotides as they occur in the cell. As the throughput of nanopore sequencers improves, simultaneous detection of multiple epigenetic modifications to cytosines will become an important capability of these devices. Here we present a statistical model that allows the Oxford Nanopore Technologies MinION to be used for detecting chemical modifications to cytosine using standard DNA preparation and sequencing techniques. Our method is based on modeling the ionic current due to DNA k-mers with a variable-order hidden Markov model where the emissions are distributed according to a hierarchical Dirichlet process mixture of normal distributions. This method provides a principled way to expand the nucleotide alphabet to allow for variant calling of modified bases.

Download Full-text

Clairvoyante: a multi-task convolutional deep neural network for variant calling in Single Molecule Sequencing

10.1101/310458 ◽

2018 ◽

Cited By ~ 9

Author(s):

Ruibang Luo ◽

Fritz J. Sedlazeck ◽

Tak-Wah Lam ◽

Michael C. Schatz

Keyword(s):

Neural Network ◽

Single Molecule ◽

Variant Calling ◽

Accurate Identification ◽

Whole Genome Analysis ◽

Single Molecule Sequencing ◽

Oxford Nanopore ◽

Indel Length ◽

Human Sample ◽

Dna Sequence Variants

AbstractThe accurate identification of DNA sequence variants is an important, but challenging task in genomics. It is particularly difficult for single molecule sequencing, which has a per-nucleotide error rate of ~5%-15%. Meeting this demand, we developed Clairvoyante, a multi-task five-layer convolutional neural network model for predicting variant type (SNP or indel), zygosity, alternative allele and indel length from aligned reads. For the well-characterized NA12878 human sample, Clairvoyante achieved 99.73%, 97.68% and 95.36% precision on known variants, and 98.65%, 92.57%, 87.26% F1-score for whole-genome analysis, using Illumina, PacBio, and Oxford Nanopore data, respectively. Training on a second human sample shows Clairvoyante is sample agnostic and finds variants in less than two hours on a standard server. Furthermore, we identified 3,135 variants that are missed using Illumina but supported independently by both PacBio and Oxford Nanopore reads. Clairvoyante is available open-source (https://github.com/aquaskyline/Clairvoyante), with modules to train, utilize and visualize the model.

Download Full-text

Longshot: accurate variant calling in diploid genomes using single-molecule long read sequencing

10.1101/564443 ◽

2019 ◽

Cited By ~ 1

Author(s):

Peter Edge ◽

Vikas Bansal

Keyword(s):

Single Molecule ◽

Variant Calling ◽

Whole Genome ◽

Single Nucleotide Variants ◽

Single Nucleotide ◽

Short Reads ◽

Pacific Biosciences ◽

Sequencing Technologies ◽

Accurate Detection ◽

Oxford Nanopore

AbstractShort-read sequencing technologies such as Illumina enable the accurate detection of single nucleotide variants (SNVs) and short insertion/deletion variants in human genomes but are unable to provide information about haplotypes and variants in repetitive regions of the genome. Single-molecule sequencing technologies such as Pacific Biosciences and Oxford Nanopore generate long reads (≥ 10 kb in length) that can potentially address these limitations of short reads. However, the high error rate of SMS reads makes it challenging to detect small-scale variants in diploid genomes. We introduce a variant calling method, Longshot, that leverages the haplotype information present in SMS reads to enable the accurate detection and phasing of single nucleotide variants in diploid genomes. Using whole-genome Pacific Biosciences data for multiple human individuals, we demonstrate that Longshot achieves very high accuracy for SNV detection (precision ≥0.992 and recall ≥0.96) that is significantly better than existing variant calling methods. Longshot can also call SNVs with good accuracy using whole-genome Oxford Nanopore data. Finally, we demonstrate that it enables the discovery of variants in duplicated regions of the genome that cannot be mapped using short reads. Longshot is freely available at https://github.com/pjedge/longshot.

Download Full-text

Analysis of HLA-G long-read genomic sequences in mother–offspring pairs with preeclampsia

Scientific Reports ◽

10.1038/s41598-020-77081-3 ◽

2020 ◽

Vol 10 (1) ◽

Author(s):

Ayako Nishizawa ◽

Kazuki Kumada ◽

Keiko Tateno ◽

Maiko Wagata ◽

Sakae Saito ◽

...

Keyword(s):

Single Molecule ◽

Gene Polymorphisms ◽

Genomic Dna ◽

Genomic Sequences ◽

Genomic Sequencing ◽

Public Database ◽

Coding Sequences ◽

Pacbio Rs Ii ◽

Potential Association ◽

Long Read

AbstractPreeclampsia is a pregnancy-induced disorder that is characterized by hypertension and is a leading cause of perinatal and maternal–fetal morbidity and mortality. HLA-G is thought to play important roles in maternal–fetal immune tolerance, and the associations between HLA-G gene polymorphisms and the onset of pregnancy-related diseases have been explored extensively. Because contiguous genomic sequencing is difficult, the association between the HLA-G genotype and preeclampsia onset is controversial. In this study, genomic sequences of the HLA-G region (5.2 kb) from 31 pairs of mother–offspring genomic DNA samples (18 pairs from normal pregnancies/births and 13 from preeclampsia births) were obtained by single-molecule real-time sequencing using the PacBio RS II platform. The HLA-G alleles identified in our cohort matched seven known HLA-G alleles, but we also identified two new HLA-G alleles at the fourth-field resolution and compared them with nucleotide sequences from a public database that consisted of coding sequences that cover the 3.1-kb HLA-G gene span. Intriguingly, a potential association between preeclampsia onset and the poly T stretch within the downstream region of the HLA-G*01:01:01:01 allele was found. Our study suggests that long-read sequencing of HLA-G will provide clues for characterizing HLA-G variants that are involved in the pathophysiology of preeclampsia.

Download Full-text

Hapo-G, haplotype-aware polishing of genome assemblies with accurate reads

NAR Genomics and Bioinformatics ◽

10.1093/nargab/lqab034 ◽

2021 ◽

Vol 3 (2) ◽

Author(s):

Jean-Marc Aury ◽

Benjamin Istace

Keyword(s):

Single Molecule ◽

Direct Consequence ◽

High Quality ◽

Sequencing Errors ◽

Coding Regions ◽

Sequencing Technologies ◽

Long Reads ◽

Oxford Nanopore ◽

Long Read ◽

Genome Assemblies

Abstract Single-molecule sequencing technologies have recently been commercialized by Pacific Biosciences and Oxford Nanopore with the promise of sequencing long DNA fragments (kilobases to megabases order) and then, using efficient algorithms, provide high quality assemblies in terms of contiguity and completeness of repetitive regions. However, the error rate of long-read technologies is higher than that of short-read technologies. This has a direct consequence on the base quality of genome assemblies, particularly in coding regions where sequencing errors can disrupt the coding frame of genes. In the case of diploid genomes, the consensus of a given gene can be a mixture between the two haplotypes and can lead to premature stop codons. Several methods have been developed to polish genome assemblies using short reads and generally, they inspect the nucleotide one by one, and provide a correction for each nucleotide of the input assembly. As a result, these algorithms are not able to properly process diploid genomes and they typically switch from one haplotype to another. Herein we proposed Hapo-G (Haplotype-Aware Polishing Of Genomes), a new algorithm capable of incorporating phasing information from high-quality reads (short or long-reads) to polish genome assemblies and in particular assemblies of diploid and heterozygous genomes.

Download Full-text

Multiple and Simultaneous Detection of Specific Bacteria in Enriched Bacterial Communities Using a DNA Microarray Chip with Randomly Generated Genomic DNA Probes

Analytical Chemistry ◽

10.1021/ac048703c ◽

2005 ◽

Vol 77 (8) ◽

pp. 2311-2317 ◽

Cited By ~ 19

Author(s):

Byoung Chan Kim ◽

Ji Hyun Park ◽

Man Bock Gu

Keyword(s):

Dna Microarray ◽

Bacterial Communities ◽

Genomic Dna ◽

Simultaneous Detection ◽

Dna Probes ◽

Microarray Chip

Download Full-text

Resolving the Dynamic Non-Covalent Interaction inside Membrane Protein Channel by Single-Molecule Interaction Spectrum

10.26434/chemrxiv.7251683 ◽

2018 ◽

Author(s):

Meng-Yin Li ◽

Yi-Lun Ying ◽

Xi-Xin Fu ◽

Jie Yu ◽

Shao-Chuang Liu ◽

...

Keyword(s):

Membrane Protein ◽

Single Molecule ◽

Ionic Current ◽

Md Simulations ◽

Energy Spectra ◽

Membrane Channels ◽

Non Covalent Interactions ◽

Protein Channels ◽

Covalent Interactions ◽

Molecule Interaction

Millions of years of evolution have produced membrane protein channels capable of efficiently moving ions across the cell membrane. The underlying fundamental mechanisms that facilitate these actions greatly contribute to the weak non-covalent interactions. However, uncovering these dynamic interactions and its synergic network effects still remains challenging in both experimental techniques and molecule dynamics (MD) simulations. Here, we present a rational strategy that combines MD simulations and frequency-energy spectroscopy to identify and quantify the role of non-covalent interactions in carrier transport through membrane protein channels, as encoded in traditional single channel recording or ionic current. We employed wild-type aerolysin transporting of methylcytosine and cytosine as a model to explore the dynamic ionic signatures with non-stationary and non-linear frequency analysis. Our data illuminate that methylcytosine experiences strong non-covalent interactions with the aerolysin nanopore at Region 1 around R220 than cytosine, which produces characteristic frequency-energy spectra. Furthermore, we experimentally validate the obtained hypothesis from frequency-energy spectra by designing single-site mutation of K238G which creates significantly enhanced non-covalent interactions for the recognition of methylcytosine. The frequency-energy spectrum of ions flowing inside membrane channels constitutes a single-molecule interaction spectrum, which bridges the gap between traditional ionic current recording and the MD simulations, facilitating the qualitative and quantitive description of the non-covalent interactions inside membrane channels.

Download Full-text

A Transposon Story: From TE Content to TE Dynamic Invasion of Drosophila Genomes Using the Single-Molecule Sequencing Technology from Oxford Nanopore

Cells ◽

10.3390/cells9081776 ◽

2020 ◽

Vol 9 (8) ◽

pp. 1776

Author(s):

Mourdas Mohamed ◽

Nguyet Thi-Minh Dang ◽

Yuki Ogyama ◽

Nelly Burlet ◽

Bruno Mugat ◽

...

Keyword(s):

Single Molecule ◽

Wild Type ◽

Sequencing Data ◽

Short Read ◽

Short Read Sequencing ◽

Sequencing Technologies ◽

Oxford Nanopore ◽

In The Wild ◽

Successive Generations ◽

Type Strains

Transposable elements (TEs) are the main components of genomes. However, due to their repetitive nature, they are very difficult to study using data obtained with short-read sequencing technologies. Here, we describe an efficient pipeline to accurately recover TE insertion (TEI) sites and sequences from long reads obtained by Oxford Nanopore Technology (ONT) sequencing. With this pipeline, we could precisely describe the landscapes of the most recent TEIs in wild-type strains of Drosophila melanogaster and Drosophila simulans. Their comparison suggests that this subset of TE sequences is more similar than previously thought in these two species. The chromosome assemblies obtained using this pipeline also allowed recovering piRNA cluster sequences, which was impossible using short-read sequencing. Finally, we used our pipeline to analyze ONT sequencing data from a D. melanogaster unstable line in which LTR transposition was derepressed for 73 successive generations. We could rely on single reads to identify new insertions with intact target site duplications. Moreover, the detailed analysis of TEIs in the wild-type strains and the unstable line did not support the trap model claiming that piRNA clusters are hotspots of TE insertions.

Download Full-text

Cytosine-5 methylation-directed construction of a Au nanoparticle-based nanosensor for simultaneous detection of multiple DNA methyltransferases at the single-molecule level

Chemical Science ◽

10.1039/d0sc03240a ◽

2020 ◽

Vol 11 (35) ◽

pp. 9675-9684

Author(s):

Li-Juan Wang ◽

Xiao Han ◽

Jian-Ge Qiu ◽

BingHua Jiang ◽

Chun-Yang Zhang

Keyword(s):

Single Molecule ◽

Simultaneous Detection ◽

Dna Methyltransferases ◽

Sensitive Detection ◽

Au Nanoparticle ◽

Single Molecule Level

Cytosine-5 methylation-directed construction of Au nanoparticle-based nanosensors enables specific and sensitive detection of multiple DNA methyltransferases.

Download Full-text

Assessment of Viral Targeted Sequence Capture Using Nanopore Sequencing Directly from Clinical Samples

Viruses ◽

10.3390/v12121358 ◽

2020 ◽

Vol 12 (12) ◽

pp. 1358

Author(s):

Leonard Schuele ◽

Hayley Cassidy ◽

Erley Lizarazo ◽

Katrin Strutzberg-Minder ◽

Sabine Schuetze ◽

...

Keyword(s):

Influenza A ◽

Simultaneous Detection ◽

Clinical Samples ◽

Metagenomic Sequencing ◽

Genome Coverage ◽

Sequence Capture ◽

Shotgun Metagenomic Sequencing ◽

Oxford Nanopore ◽

The One ◽

Targeted Sequence Capture

Shotgun metagenomic sequencing (SMg) enables the simultaneous detection and characterization of viruses in human, animal and environmental samples. However, lack of sensitivity still poses a challenge and may lead to poor detection and data acquisition for detailed analysis. To improve sensitivity, we assessed a broad scope targeted sequence capture (TSC) panel (ViroCap) in both human and animal samples. Moreover, we adjusted TSC for the Oxford Nanopore MinION and compared the performance to an SMg approach. TSC on the Illumina NextSeq served as the gold standard. Overall, TSC increased the viral read count significantly in challenging human samples, with the highest genome coverage achieved using the TSC on the MinION. TSC also improved the genome coverage and sequencing depth in clinically relevant viruses in the animal samples, such as influenza A virus. However, SMg was shown to be adequate for characterizing a highly diverse animal virome. TSC on the MinION was comparable to the NextSeq and can provide a valuable alternative, offering longer reads, portability and lower initial cost. Developing new viral enrichment approaches to detect and characterize significant human and animal viruses is essential for the One Health Initiative.

Download Full-text