scholarly journals A single chromosome assembly of Bacteroides fragilis strain BE1 from Illumina and MinION nanopore sequencing data

2015 ◽  
Author(s):  
Judith Risse ◽  
Marian Thomson ◽  
Garry Blakely ◽  
Georgios Koutsovoulos ◽  
Mark Blaxter ◽  
...  

Background Second and third generation sequencing technologies have revolutionised bacterial genomics. Short-read Illumina reads result in cheap but fragmented assemblies, whereas longer reads are more expensive but result in more complete genomes. The Oxford Nanopore MinION device is a revolutionary mobile sequencer that can produce thousands of long, single molecule reads. Results We sequenced Bacteroides fragilis strain BE1 using both the Illumina MiSeq and Oxford Nanopore MinION platforms. We were able to assemble a single chromosome of 5.18 Mb, with no gaps, using publicly available software and commodity computing hardware. We identified gene rearrangements and the state of invertible promoters in the strain. Conclusions The single chromosome assembly of Bacteroides fragilis strain BE1 was achieved using only modest amounts of data, publicly available software and commodity computing hardware. This combination of technologies offers the possibility of ultra-cheap, high quality, finished bacterial genomes.

Cells ◽  
2020 ◽  
Vol 9 (8) ◽  
pp. 1776
Author(s):  
Mourdas Mohamed ◽  
Nguyet Thi-Minh Dang ◽  
Yuki Ogyama ◽  
Nelly Burlet ◽  
Bruno Mugat ◽  
...  

Transposable elements (TEs) are the main components of genomes. However, due to their repetitive nature, they are very difficult to study using data obtained with short-read sequencing technologies. Here, we describe an efficient pipeline to accurately recover TE insertion (TEI) sites and sequences from long reads obtained by Oxford Nanopore Technology (ONT) sequencing. With this pipeline, we could precisely describe the landscapes of the most recent TEIs in wild-type strains of Drosophila melanogaster and Drosophila simulans. Their comparison suggests that this subset of TE sequences is more similar than previously thought in these two species. The chromosome assemblies obtained using this pipeline also allowed recovering piRNA cluster sequences, which was impossible using short-read sequencing. Finally, we used our pipeline to analyze ONT sequencing data from a D. melanogaster unstable line in which LTR transposition was derepressed for 73 successive generations. We could rely on single reads to identify new insertions with intact target site duplications. Moreover, the detailed analysis of TEIs in the wild-type strains and the unstable line did not support the trap model claiming that piRNA clusters are hotspots of TE insertions.


2017 ◽  
Author(s):  
Krešimir Križanović ◽  
Ivan Sović ◽  
Ivan Krpelnik ◽  
Mile Šikić

AbstractNext generation sequencing technologies have made RNA sequencing widely accessible and applicable in many areas of research. In recent years, 3rd generation sequencing technologies have matured and are slowly replacing NGS for DNA sequencing. This paper presents a novel tool for RNA mapping guided by gene annotations. The tool is an adapted version of a previously developed DNA mapper – GraphMap, tailored for third generation sequencing data, such as those produced by Pacific Biosciences or Oxford Nanopore Technologies devices. It uses gene annotations to generate a transcriptome, uses a DNA mapping algorithm to map reads to the transcriptome, and finally transforms the mappings back to genome coordinates. Modified version of GraphMap is compared on several synthetic datasets to the state-of-the-art RNAseq mappers enabled to work with third generation sequencing data. The results show that our tool outperforms other tools in general mapping quality.


Author(s):  
Chen Cao ◽  
Jingni He ◽  
Lauren Mak ◽  
Deshan Perera ◽  
Devin Kwok ◽  
...  

Abstract DNA sequencing technologies provide unprecedented opportunities to analyze within-host evolution of microorganism populations. Often, within-host populations are analyzed via pooled sequencing of the population, which contains multiple individuals or “haplotypes.” However, current next-generation sequencing instruments, in conjunction with single-molecule barcoded linked-reads, cannot distinguish long haplotypes directly. Computational reconstruction of haplotypes from pooled sequencing has been attempted in virology, bacterial genomics, metagenomics, and human genetics, using algorithms based on either cross-host genetic sharing or within-host genomic reads. Here, we describe PoolHapX, a flexible computational approach that integrates information from both genetic sharing and genomic sequencing. We demonstrated that PoolHapX outperforms state-of-the-art tools tailored to specific organismal systems, and is robust to within-host evolution. Importantly, together with barcoded linked-reads, PoolHapX can infer whole-chromosome-scale haplotypes from 50 pools each containing 12 different haplotypes. By analyzing real data, we uncovered dynamic variations in the evolutionary processes of within-patient HIV populations previously unobserved in single position-based analysis.


2021 ◽  
Vol 3 (2) ◽  
Author(s):  
Jean-Marc Aury ◽  
Benjamin Istace

Abstract Single-molecule sequencing technologies have recently been commercialized by Pacific Biosciences and Oxford Nanopore with the promise of sequencing long DNA fragments (kilobases to megabases order) and then, using efficient algorithms, provide high quality assemblies in terms of contiguity and completeness of repetitive regions. However, the error rate of long-read technologies is higher than that of short-read technologies. This has a direct consequence on the base quality of genome assemblies, particularly in coding regions where sequencing errors can disrupt the coding frame of genes. In the case of diploid genomes, the consensus of a given gene can be a mixture between the two haplotypes and can lead to premature stop codons. Several methods have been developed to polish genome assemblies using short reads and generally, they inspect the nucleotide one by one, and provide a correction for each nucleotide of the input assembly. As a result, these algorithms are not able to properly process diploid genomes and they typically switch from one haplotype to another. Herein we proposed Hapo-G (Haplotype-Aware Polishing Of Genomes), a new algorithm capable of incorporating phasing information from high-quality reads (short or long-reads) to polish genome assemblies and in particular assemblies of diploid and heterozygous genomes.


BMC Genomics ◽  
2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Ratanond Koonchanok ◽  
Swapna Vidhur Daulatabad ◽  
Quoseena Mir ◽  
Khairi Reda ◽  
Sarath Chandra Janga

Abstract Background Direct-sequencing technologies, such as Oxford Nanopore’s, are delivering long RNA reads with great efficacy and convenience. These technologies afford an ability to detect post-transcriptional modifications at a single-molecule resolution, promising new insights into the functional roles of RNA. However, realizing this potential requires new tools to analyze and explore this type of data. Result Here, we present Sequoia, a visual analytics tool that allows users to interactively explore nanopore sequences. Sequoia combines a Python-based backend with a multi-view visualization interface, enabling users to import raw nanopore sequencing data in a Fast5 format, cluster sequences based on electric-current similarities, and drill-down onto signals to identify properties of interest. We demonstrate the application of Sequoia by generating and analyzing ~ 500k reads from direct RNA sequencing data of human HeLa cell line. We focus on comparing signal features from m6A and m5C RNA modifications as the first step towards building automated classifiers. We show how, through iterative visual exploration and tuning of dimensionality reduction parameters, we can separate modified RNA sequences from their unmodified counterparts. We also document new, qualitative signal signatures that characterize these modifications from otherwise normal RNA bases, which we were able to discover from the visualization. Conclusions Sequoia’s interactive features complement existing computational approaches in nanopore-based RNA workflows. The insights gleaned through visual analysis should help users in developing rationales, hypotheses, and insights into the dynamic nature of RNA. Sequoia is available at https://github.com/dnonatar/Sequoia.


2018 ◽  
Vol 5 (suppl_1) ◽  
pp. S364-S364
Author(s):  
Roby Bhattacharyya ◽  
Alejandro Pironti ◽  
Bruce J Walker ◽  
Abigail Manson ◽  
Virginia Pierce ◽  
...  

Abstract Background Carbapenem-resistant Enterobacteriaceae (CRE) are a major public health threat. We report four clonally related Citrobacter freundii isolates harboring the blaKPC-3 carbapenemase in April–May 2017 that are nearly identical to a strain from 2014 at the same institution. Despite differing by ≤5 single nucleotide polymorphisms (SNPs), these isolates exhibited dramatic differences in carbapenemase plasmid architecture. Methods We sequenced four carbapenem-resistant C. freundii isolates from 2017 and compared them with an ongoing CRE surveillance project at our institution. SNPs were identified from Illumina MiSeq data aligned to a reference genome using the variant caller Pilon. Plasmids were assembled from Illumina and Oxford Nanopore sequencing data using Unicycler. Results The four 2017 isolates differed from one another by 0–5 chromosomal SNPs; two were identical. With one exception, these isolates differed by >38,000 SNPs from 25 C. freundii isolates sequenced from 2013 to 2017 at the same institution for CRE surveillance. The exception was a 2014 isolate that differed by 13–16 SNPs from each 2017 isolate, with 13 SNPs common to all four. Each C. freundii isolate harbored wild-type blaKPC-3. Despite the close relationship among the 2017 cluster, the plasmids harboring the blaKPC-3 genes differed dramatically: the carbapenemase occurred in one of the two different plasmids, with rearrangements between these plasmids across isolates. The related 2014 isolate harbored both plasmids, each with a separate copy of blaKPC-3. No transmission chains were found between any of the affected patients. Conclusion WGS confirmed clonality among four contemporaneous blaKPC-3-containing C. freundii isolates, and marked similarity with a 2014 isolate, within an institution. That only 13–16 SNPs varied between the 2014 and 2017 isolates suggests durable persistence of the blaKPC-3 gene within this lineage in a hospital ecosystem. The plasmids harboring these carbapenemase genes proved remarkably plastic, with plasmid loss and rearrangements occurring on the same time scale as two to three chromosomal point mutations. Combining short and long-read sequencing in a case cluster uniquely revealed unexpectedly rapid dynamics of carbapenemase plasmids, providing critical insight into their manner of spread. Disclosures M. J. Ferraro, SeLux Diagnostics: Scientific Advisor and Shareholder, Consulting fee. D. C. Hooper, SeLux Diagnostics: Scientific Advisor, Consulting fee.


2021 ◽  
Vol 12 ◽  
Author(s):  
Davide Bolognini ◽  
Alberto Magi

Structural variants (SVs) are genomic rearrangements that involve at least 50 nucleotides and are known to have a serious impact on human health. While prior short-read sequencing technologies have often proved inadequate for a comprehensive assessment of structural variation, more recent long reads from Oxford Nanopore Technologies have already been proven invaluable for the discovery of large SVs and hold the potential to facilitate the resolution of the full SV spectrum. With many long-read sequencing studies to follow, it is crucial to assess factors affecting current SV calling pipelines for nanopore sequencing data. In this brief research report, we evaluate and compare the performances of five long-read SV callers across four long-read aligners using both real and synthetic nanopore datasets. In particular, we focus on the effects of read alignment, sequencing coverage, and variant allele depth on the detection and genotyping of SVs of different types and size ranges and provide insights into precision and recall of SV callsets generated by integrating the various long-read aligners and SV callers. The computational pipeline we propose is publicly available at https://github.com/davidebolo1993/EViNCe and can be adjusted to further evaluate future nanopore sequencing datasets.


2019 ◽  
Author(s):  
Luca Cozzuto ◽  
Huanle Liu ◽  
Leszek P. Pryszcz ◽  
Toni Hermoso Pulido ◽  
Julia Ponomarenko ◽  
...  

ABSTRACTThe direct RNA sequencing platform offered by Oxford Nanopore Technologies allows for direct measurement of RNA molecules without the need of conversion to complementary DNA, fragmentation or amplification. As such, it is virtually capable of detecting any given RNA modification present in the molecule that is being sequenced, as well as provide polyA tail length estimations at the level of individual RNA molecules. Although this technology has been publicly available since 2017, the complexity of the raw Nanopore data, together with the lack of systematic and reproducible pipelines, have greatly hindered the access of this technology to the general user. Here we address this problem by providing a fully benchmarked workflow for the analysis of direct RNA sequencing reads, termed MasterOfPores. The pipeline converts raw current intensities into multiple types of processed data, providing metrics of the quality of the run, quality-filtering, base-calling and mapping. The output of the pipeline can in turn be used to compute per-gene counts, RNA modifications, and prediction of polyA tail length and RNA isoforms. The software is written using the NextFlow framework for parallelization and portability, and relies on Linux containers such as Docker and Singularity for achieving better reproducibility. The MasterOfPores workflow can be executed on any Unix-compatible OS on a computer, cluster or cloud without the need of installing any additional software or dependencies, and is freely available in Github (https://github.com/biocorecrg/master_of_pores). This workflow will significantly simplify the analysis of nanopore direct RNA sequencing data by non-bioinformatics experts, thus boosting the understanding of the (epi)transcriptome with single molecule resolution.


2019 ◽  
Vol 21 (6) ◽  
pp. 1971-1986 ◽  
Author(s):  
Matteo Chiara ◽  
Federico Zambelli ◽  
Ernesto Picardi ◽  
David S Horner ◽  
Graziano Pesole

Abstract A number of studies have reported the successful application of single-molecule sequencing technologies to the determination of the size and sequence of pathological expanded microsatellite repeats over the last 5 years. However, different custom bioinformatics pipelines were employed in each study, preventing meaningful comparisons and somewhat limiting the reproducibility of the results. In this review, we provide a brief summary of state-of-the-art methods for the characterization of expanded repeats alleles, along with a detailed comparison of bioinformatics tools for the determination of repeat length and sequence, using both real and simulated data. Our reanalysis of publicly available human genome sequencing data suggests a modest, but statistically significant, increase of the error rate of single-molecule sequencing technologies at genomic regions containing short tandem repeats. However, we observe that all the methods herein tested, irrespective of the strategy used for the analysis of the data (either based on the alignment or assembly of the reads), show high levels of sensitivity in both the detection of expanded tandem repeats and the estimation of the expansion size, suggesting that approaches based on single-molecule sequencing technologies are highly effective for the detection and quantification of tandem repeat expansions and contractions.


2020 ◽  
Vol 71 (18) ◽  
pp. 5313-5322 ◽  
Author(s):  
Kathryn Dumschott ◽  
Maximilian H-W Schmidt ◽  
Harmeet Singh Chawla ◽  
Rod Snowdon ◽  
Björn Usadel

Abstract DNA sequencing was dominated by Sanger’s chain termination method until the mid-2000s, when it was progressively supplanted by new sequencing technologies that can generate much larger quantities of data in a shorter time. At the forefront of these developments, long-read sequencing technologies (third-generation sequencing) can produce reads that are several kilobases in length. This greatly improves the accuracy of genome assemblies by spanning the highly repetitive segments that cause difficulty for second-generation short-read technologies. Third-generation sequencing is especially appealing for plant genomes, which can be extremely large with long stretches of highly repetitive DNA. Until recently, the low basecalling accuracy of third-generation technologies meant that accurate genome assembly required expensive, high-coverage sequencing followed by computational analysis to correct for errors. However, today’s long-read technologies are more accurate and less expensive, making them the method of choice for the assembly of complex genomes. Oxford Nanopore Technologies (ONT), a third-generation platform for the sequencing of native DNA strands, is particularly suitable for the generation of high-quality assemblies of highly repetitive plant genomes. Here we discuss the benefits of ONT, especially for the plant science community, and describe the issues that remain to be addressed when using ONT for plant genome sequencing.


Sign in / Sign up

Export Citation Format

Share Document