scholarly journals Nanopore base-calling from a perspective of instance segmentation

2019 ◽  
Author(s):  
Yao-zhong Zhang ◽  
Arda Akdemir ◽  
Georg Tremmel ◽  
Seiya Imoto ◽  
Satoru Miyano ◽  
...  

AbstractBackgroundNanopore sequencing is a rapidly developing third-generation sequencing technology, which can generate long nucleotide reads of molecules within a portable device in real time. Through detecting the change of ion currency signals during a DNA/RNA fragment’s pass through a nanopore, genotypes are determined. Currently, the accuracy of nanopore base-calling has a higher error rate than short-read base-calling. Through utilizing deep neural networks, the-state-of-the art nanopore base-callers achieve base-calling accuracy in a range from 85% to 95%.ResultIn this work, we proposed a novel base-calling approach from a perspective of instance segmentation. Different from the previous sequence labeling approaches, we formulated the base-calling problem as a multi-label segmentation task. Meanwhile, we proposed a refined U-net model which we call UR-net that can model sequential dependencies for a one-dimensional segmentation task. The experiment results show that the proposed base-caller URnano achieves competitive results compared to recently proposed CTC-featured base-caller Chiron, on the same amount of training and test data for in-domain evaluation. Our results show that formulating the base-calling problem as a one-dimensional segmentation task is a promising approach.AvailabilityThe source code and data are available at https://github.com/yaozhong/[email protected] informationSupplementary data are available at attachment online.

2020 ◽  
Vol 36 (12) ◽  
pp. 3669-3679 ◽  
Author(s):  
Can Firtina ◽  
Jeremie S Kim ◽  
Mohammed Alser ◽  
Damla Senol Cali ◽  
A Ercument Cicek ◽  
...  

Abstract Motivation Third-generation sequencing technologies can sequence long reads that contain as many as 2 million base pairs. These long reads are used to construct an assembly (i.e. the subject’s genome), which is further used in downstream genome analysis. Unfortunately, third-generation sequencing technologies have high sequencing error rates and a large proportion of base pairs in these long reads is incorrectly identified. These errors propagate to the assembly and affect the accuracy of genome analysis. Assembly polishing algorithms minimize such error propagation by polishing or fixing errors in the assembly by using information from alignments between reads and the assembly (i.e. read-to-assembly alignment information). However, current assembly polishing algorithms can only polish an assembly using reads from either a certain sequencing technology or a small assembly. Such technology-dependency and assembly-size dependency require researchers to (i) run multiple polishing algorithms and (ii) use small chunks of a large genome to use all available readsets and polish large genomes, respectively. Results We introduce Apollo, a universal assembly polishing algorithm that scales well to polish an assembly of any size (i.e. both large and small genomes) using reads from all sequencing technologies (i.e. second- and third-generation). Our goal is to provide a single algorithm that uses read sets from all available sequencing technologies to improve the accuracy of assembly polishing and that can polish large genomes. Apollo (i) models an assembly as a profile hidden Markov model (pHMM), (ii) uses read-to-assembly alignment to train the pHMM with the Forward–Backward algorithm and (iii) decodes the trained model with the Viterbi algorithm to produce a polished assembly. Our experiments with real readsets demonstrate that Apollo is the only algorithm that (i) uses reads from any sequencing technology within a single run and (ii) scales well to polish large assemblies without splitting the assembly into multiple parts. Availability and implementation Source code is available at https://github.com/CMU-SAFARI/Apollo. Supplementary information Supplementary data are available at Bioinformatics online.


2021 ◽  
Vol 118 (8) ◽  
pp. e2007192118
Author(s):  
Wan-Chen Li ◽  
Chia-Yi Lee ◽  
Wei-Hsuan Lan ◽  
Tai-Ting Woo ◽  
Hou-Cheng Liu ◽  
...  

Most eukaryotes possess two RecA-like recombinases (ubiquitous Rad51 and meiosis-specific Dmc1) to promote interhomolog recombination during meiosis. However, some eukaryotes have lost Dmc1. Given that mammalian and yeast Saccharomyces cerevisiae (Sc) Dmc1 have been shown to stabilize recombination intermediates containing mismatches better than Rad51, we used the Pezizomycotina filamentous fungus Trichoderma reesei to address if and how Rad51-only eukaryotes conduct interhomolog recombination in zygotes with high sequence heterogeneity. We applied multidisciplinary approaches (next- and third-generation sequencing technology, genetics, cytology, bioinformatics, biochemistry, and single-molecule biophysics) to show that T. reesei Rad51 (TrRad51) is indispensable for interhomolog recombination during meiosis and, like ScDmc1, TrRad51 possesses better mismatch tolerance than ScRad51 during homologous recombination. Our results also indicate that the ancestral TrRad51 evolved to acquire ScDmc1-like properties by creating multiple structural variations, including via amino acid residues in the L1 and L2 DNA-binding loops.


Author(s):  
E. S. Gribchenko

The transcriptome profiles the cv. Frisson mycorrhizal roots and inoculated nitrogen-fixing nodules were investigated using the Oxford Nanopore sequencing technology. A database of gene isoforms and their expression has been created.


2021 ◽  
Author(s):  
Fawaz Dabbaghie ◽  
Jana Ebler ◽  
Tobias Marschall

AbstractMotivationWith the fast development of third generation sequencing machines, de novo genome assembly is becoming a routine even for larger genomes. Graph-based representations of genomes arise both as part of the assembly process, but also in the context of pangenomes representing a population. In both cases, polymorphic loci lead to bubble structures in such graphs. Detecting bubbles is hence an important task when working with genomic variants in the context of genome graphs.ResultsHere, we present a fast general-purpose tool, called BubbleGun, for detecting bubbles and superbubbles in genome graphs. Furthermore, BubbleGun detects and outputs runs of linearly connected bubbles and superbubbles, which we call bubble chains. We showcase its utility on de Bruijn graphs and compare our results to vg’s snarl detection. We show that BubbleGun is considerably faster than vg especially in bigger graphs, where it reports all bubbles in less than 30 minutes on a human sample de Bruijn graph of around 2 million nodes.AvailabilityBubbleGun is available and documented at https://github.com/fawaz-dabbaghieh/bubble_gun under MIT [email protected] or [email protected] informationSupplementary data are available at Bioinformatics online.


2021 ◽  
Vol 19 ◽  
pp. 205873922110026
Author(s):  
Yi-Yan Wang ◽  
Qiong Huang ◽  
Yang Cheng

Endophthalmitis is a rare and infectious disease caused by Streptococcus suis (S suis). Traditionally, S suis is detected by the pathogenic microorganism culture method, which has low positivity and high false negativity. Nanopore sequencing (NS), which is a third-generation sequencing technology, has several advantages over the traditional method; in particular, it is cost and time effective and has a high throughput. In this report, a case of infectious endophthalmitis caused by trauma is examined. The NS results suggest that the pathogen in question is a mixed infection caused by S suis and Clostridium perfringens. This case report provides evidence of the fact that NS can quickly identify pathogens, which is of great significance for clinical diagnosis and treatment.


2021 ◽  
Vol 12 (6) ◽  
Author(s):  
Yanfei Li ◽  
Yueling Jin ◽  
Jianming Zhang ◽  
Haoying Pan ◽  
Lan Wu ◽  
...  

AbstractHuman gut microbiota modulates normal physiological functions, such as maintenance of barrier homeostasis and modulation of metabolism, as well as various chronic diseases including type 2 diabetes and gastrointestinal cancer. Despite decades of research, the composition of the gut microbiota remains poorly understood. Here, we established an effective extraction method to obtain high quality gut microbiota genomes, and analyzed them with third-generation sequencing technology. We acquired a large quantity of data from each sample and assembled large numbers of reliable contigs. With this approach, we constructed tens of completed bacterial genomes in which there were several new bacteria species. We also identified a new conditional pathogen, Enterococcus tongjius, which is a member of Enterococci. This work provided a novel and reliable approach to recover gut microbiota genomes, facilitating the discovery of new bacteria species and furthering our understanding of the microbiome that underlies human health and diseases.


2020 ◽  
Author(s):  
Yanfei Li ◽  
Yueling Jin ◽  
Haoying Pan ◽  
Jianming Zhang ◽  
Lan Wu ◽  
...  

Abstract BackgroundHuman gut microbiota modulates normal physiological functions, such as the maintenance of barrier homeostasis and the modulation of metabolism, and various chronic diseases including type 2 diabetes and gastrointestinal cancer. Despite decades of researches, the composition of the gut microbiota remains unexplored and unidentified. ResultsHere we established an effective extraction method to obtain high-quality gut microbiota genomic DNA and detected the samples with third-generation sequencing technology. We acquired a quite big data form each sample and assembled many reliable contigs. Not only enormous unknown genes, but also several new bacteria subspecies or species were identified. ConclusionsThis work provides a novel and reliable framework to recover gut microbiota genomes substantially, facilitating the understanding of the roles of the microbiome that underlie in human health and disease.


Sign in / Sign up

Export Citation Format

Share Document