Cytosine Variant Calling with High-throughput Nanopore Sequencing

2016 ◽  
Author(s):  
Arthur C. Rand ◽  
Miten Jain ◽  
Jordan Eizenga ◽  
Audrey Musselman-Brown ◽  
Hugh E. Olsen ◽  
...  

AbstractChemical modifications to DNA regulate cellular state and function. The Oxford Nanopore MinION is a portable single-molecule DNA sequencer that can sequence long fragments of genomic DNA. Here we show that the MinION can be used to detect and map two chemical modifications cytosine, 5-methylcytosine and 5-hydroxymethylcytosine. We present a probabilistic method that enables expansion of the nucleotide alphabet to include bases containing chemical modifications. Our results on synthetic DNA show that individual cytosine base modifications can be classified with accuracy up to 95% in a three-way comparison and 98% in a two-way comparison.Statement of SignificanceNanopore-based sequencing technology can produce long reads from unamplified genomic DNA, potentially allowing the characterization of chemical modifications and non-canonical DNA nucleotides as they occur in the cell. As the throughput of nanopore sequencers improves, simultaneous detection of multiple epigenetic modifications to cytosines will become an important capability of these devices. Here we present a statistical model that allows the Oxford Nanopore Technologies MinION to be used for detecting chemical modifications to cytosine using standard DNA preparation and sequencing techniques. Our method is based on modeling the ionic current due to DNA k-mers with a variable-order hidden Markov model where the emissions are distributed according to a hierarchical Dirichlet process mixture of normal distributions. This method provides a principled way to expand the nucleotide alphabet to allow for variant calling of modified bases.

2018 ◽  
Author(s):  
Ruibang Luo ◽  
Fritz J. Sedlazeck ◽  
Tak-Wah Lam ◽  
Michael C. Schatz

AbstractThe accurate identification of DNA sequence variants is an important, but challenging task in genomics. It is particularly difficult for single molecule sequencing, which has a per-nucleotide error rate of ~5%-15%. Meeting this demand, we developed Clairvoyante, a multi-task five-layer convolutional neural network model for predicting variant type (SNP or indel), zygosity, alternative allele and indel length from aligned reads. For the well-characterized NA12878 human sample, Clairvoyante achieved 99.73%, 97.68% and 95.36% precision on known variants, and 98.65%, 92.57%, 87.26% F1-score for whole-genome analysis, using Illumina, PacBio, and Oxford Nanopore data, respectively. Training on a second human sample shows Clairvoyante is sample agnostic and finds variants in less than two hours on a standard server. Furthermore, we identified 3,135 variants that are missed using Illumina but supported independently by both PacBio and Oxford Nanopore reads. Clairvoyante is available open-source (https://github.com/aquaskyline/Clairvoyante), with modules to train, utilize and visualize the model.


2019 ◽  
Author(s):  
Peter Edge ◽  
Vikas Bansal

AbstractShort-read sequencing technologies such as Illumina enable the accurate detection of single nucleotide variants (SNVs) and short insertion/deletion variants in human genomes but are unable to provide information about haplotypes and variants in repetitive regions of the genome. Single-molecule sequencing technologies such as Pacific Biosciences and Oxford Nanopore generate long reads (≥ 10 kb in length) that can potentially address these limitations of short reads. However, the high error rate of SMS reads makes it challenging to detect small-scale variants in diploid genomes. We introduce a variant calling method, Longshot, that leverages the haplotype information present in SMS reads to enable the accurate detection and phasing of single nucleotide variants in diploid genomes. Using whole-genome Pacific Biosciences data for multiple human individuals, we demonstrate that Longshot achieves very high accuracy for SNV detection (precision ≥0.992 and recall ≥0.96) that is significantly better than existing variant calling methods. Longshot can also call SNVs with good accuracy using whole-genome Oxford Nanopore data. Finally, we demonstrate that it enables the discovery of variants in duplicated regions of the genome that cannot be mapped using short reads. Longshot is freely available at https://github.com/pjedge/longshot.


2020 ◽  
Vol 10 (1) ◽  
Author(s):  
Ayako Nishizawa ◽  
Kazuki Kumada ◽  
Keiko Tateno ◽  
Maiko Wagata ◽  
Sakae Saito ◽  
...  

AbstractPreeclampsia is a pregnancy-induced disorder that is characterized by hypertension and is a leading cause of perinatal and maternal–fetal morbidity and mortality. HLA-G is thought to play important roles in maternal–fetal immune tolerance, and the associations between HLA-G gene polymorphisms and the onset of pregnancy-related diseases have been explored extensively. Because contiguous genomic sequencing is difficult, the association between the HLA-G genotype and preeclampsia onset is controversial. In this study, genomic sequences of the HLA-G region (5.2 kb) from 31 pairs of mother–offspring genomic DNA samples (18 pairs from normal pregnancies/births and 13 from preeclampsia births) were obtained by single-molecule real-time sequencing using the PacBio RS II platform. The HLA-G alleles identified in our cohort matched seven known HLA-G alleles, but we also identified two new HLA-G alleles at the fourth-field resolution and compared them with nucleotide sequences from a public database that consisted of coding sequences that cover the 3.1-kb HLA-G gene span. Intriguingly, a potential association between preeclampsia onset and the poly T stretch within the downstream region of the HLA-G*01:01:01:01 allele was found. Our study suggests that long-read sequencing of HLA-G will provide clues for characterizing HLA-G variants that are involved in the pathophysiology of preeclampsia.


2021 ◽  
Vol 3 (2) ◽  
Author(s):  
Jean-Marc Aury ◽  
Benjamin Istace

Abstract Single-molecule sequencing technologies have recently been commercialized by Pacific Biosciences and Oxford Nanopore with the promise of sequencing long DNA fragments (kilobases to megabases order) and then, using efficient algorithms, provide high quality assemblies in terms of contiguity and completeness of repetitive regions. However, the error rate of long-read technologies is higher than that of short-read technologies. This has a direct consequence on the base quality of genome assemblies, particularly in coding regions where sequencing errors can disrupt the coding frame of genes. In the case of diploid genomes, the consensus of a given gene can be a mixture between the two haplotypes and can lead to premature stop codons. Several methods have been developed to polish genome assemblies using short reads and generally, they inspect the nucleotide one by one, and provide a correction for each nucleotide of the input assembly. As a result, these algorithms are not able to properly process diploid genomes and they typically switch from one haplotype to another. Herein we proposed Hapo-G (Haplotype-Aware Polishing Of Genomes), a new algorithm capable of incorporating phasing information from high-quality reads (short or long-reads) to polish genome assemblies and in particular assemblies of diploid and heterozygous genomes.


2018 ◽  
Author(s):  
Meng-Yin Li ◽  
Yi-Lun Ying ◽  
Xi-Xin Fu ◽  
Jie Yu ◽  
Shao-Chuang Liu ◽  
...  

Millions of years of evolution have produced membrane protein channels capable of efficiently moving ions across the cell membrane. The underlying fundamental mechanisms that facilitate these actions greatly contribute to the weak non-covalent interactions. However, uncovering these dynamic interactions and its synergic network effects still remains challenging in both experimental techniques and molecule dynamics (MD) simulations. Here, we present a rational strategy that combines MD simulations and frequency-energy spectroscopy to identify and quantify the role of non-covalent interactions in carrier transport through membrane protein channels, as encoded in traditional single channel recording or ionic current. We employed wild-type aerolysin transporting of methylcytosine and cytosine as a model to explore the dynamic ionic signatures with non-stationary and non-linear frequency analysis. Our data illuminate that methylcytosine experiences strong non-covalent interactions with the aerolysin nanopore at Region 1 around R220 than cytosine, which produces characteristic frequency-energy spectra. Furthermore, we experimentally validate the obtained hypothesis from frequency-energy spectra by designing single-site mutation of K238G which creates significantly enhanced non-covalent interactions for the recognition of methylcytosine. The frequency-energy spectrum of ions flowing inside membrane channels constitutes a single-molecule interaction spectrum, which bridges the gap between traditional ionic current recording and the MD simulations, facilitating the qualitative and quantitive description of the non-covalent interactions inside membrane channels.


Cells ◽  
2020 ◽  
Vol 9 (8) ◽  
pp. 1776
Author(s):  
Mourdas Mohamed ◽  
Nguyet Thi-Minh Dang ◽  
Yuki Ogyama ◽  
Nelly Burlet ◽  
Bruno Mugat ◽  
...  

Transposable elements (TEs) are the main components of genomes. However, due to their repetitive nature, they are very difficult to study using data obtained with short-read sequencing technologies. Here, we describe an efficient pipeline to accurately recover TE insertion (TEI) sites and sequences from long reads obtained by Oxford Nanopore Technology (ONT) sequencing. With this pipeline, we could precisely describe the landscapes of the most recent TEIs in wild-type strains of Drosophila melanogaster and Drosophila simulans. Their comparison suggests that this subset of TE sequences is more similar than previously thought in these two species. The chromosome assemblies obtained using this pipeline also allowed recovering piRNA cluster sequences, which was impossible using short-read sequencing. Finally, we used our pipeline to analyze ONT sequencing data from a D. melanogaster unstable line in which LTR transposition was derepressed for 73 successive generations. We could rely on single reads to identify new insertions with intact target site duplications. Moreover, the detailed analysis of TEIs in the wild-type strains and the unstable line did not support the trap model claiming that piRNA clusters are hotspots of TE insertions.


2020 ◽  
Vol 11 (35) ◽  
pp. 9675-9684
Author(s):  
Li-Juan Wang ◽  
Xiao Han ◽  
Jian-Ge Qiu ◽  
BingHua Jiang ◽  
Chun-Yang Zhang

Cytosine-5 methylation-directed construction of Au nanoparticle-based nanosensors enables specific and sensitive detection of multiple DNA methyltransferases.


Viruses ◽  
2020 ◽  
Vol 12 (12) ◽  
pp. 1358
Author(s):  
Leonard Schuele ◽  
Hayley Cassidy ◽  
Erley Lizarazo ◽  
Katrin Strutzberg-Minder ◽  
Sabine Schuetze ◽  
...  

Shotgun metagenomic sequencing (SMg) enables the simultaneous detection and characterization of viruses in human, animal and environmental samples. However, lack of sensitivity still poses a challenge and may lead to poor detection and data acquisition for detailed analysis. To improve sensitivity, we assessed a broad scope targeted sequence capture (TSC) panel (ViroCap) in both human and animal samples. Moreover, we adjusted TSC for the Oxford Nanopore MinION and compared the performance to an SMg approach. TSC on the Illumina NextSeq served as the gold standard. Overall, TSC increased the viral read count significantly in challenging human samples, with the highest genome coverage achieved using the TSC on the MinION. TSC also improved the genome coverage and sequencing depth in clinically relevant viruses in the animal samples, such as influenza A virus. However, SMg was shown to be adequate for characterizing a highly diverse animal virome. TSC on the MinION was comparable to the NextSeq and can provide a valuable alternative, offering longer reads, portability and lower initial cost. Developing new viral enrichment approaches to detect and characterize significant human and animal viruses is essential for the One Health Initiative.


Sign in / Sign up

Export Citation Format

Share Document