scholarly journals Parallel and scalable workflow for the analysis of Oxford Nanopore direct RNA sequencing datasets

2019 ◽  
Author(s):  
Luca Cozzuto ◽  
Huanle Liu ◽  
Leszek P. Pryszcz ◽  
Toni Hermoso Pulido ◽  
Julia Ponomarenko ◽  
...  

ABSTRACTThe direct RNA sequencing platform offered by Oxford Nanopore Technologies allows for direct measurement of RNA molecules without the need of conversion to complementary DNA, fragmentation or amplification. As such, it is virtually capable of detecting any given RNA modification present in the molecule that is being sequenced, as well as provide polyA tail length estimations at the level of individual RNA molecules. Although this technology has been publicly available since 2017, the complexity of the raw Nanopore data, together with the lack of systematic and reproducible pipelines, have greatly hindered the access of this technology to the general user. Here we address this problem by providing a fully benchmarked workflow for the analysis of direct RNA sequencing reads, termed MasterOfPores. The pipeline converts raw current intensities into multiple types of processed data, providing metrics of the quality of the run, quality-filtering, base-calling and mapping. The output of the pipeline can in turn be used to compute per-gene counts, RNA modifications, and prediction of polyA tail length and RNA isoforms. The software is written using the NextFlow framework for parallelization and portability, and relies on Linux containers such as Docker and Singularity for achieving better reproducibility. The MasterOfPores workflow can be executed on any Unix-compatible OS on a computer, cluster or cloud without the need of installing any additional software or dependencies, and is freely available in Github (https://github.com/biocorecrg/master_of_pores). This workflow will significantly simplify the analysis of nanopore direct RNA sequencing data by non-bioinformatics experts, thus boosting the understanding of the (epi)transcriptome with single molecule resolution.

2021 ◽  
Author(s):  
Doaa Hassan ◽  
Daniel Acevedo ◽  
Swapna Vidhur Daulatabad ◽  
Quoseena Mir ◽  
Sarath Chandra Janga

AbstractPseudouridine is one of the most abundant RNA modifications, occurring when uridines are catalyzed by Pseudouridine synthase proteins. It plays an important role in many biological processes and also has an importance in drug development. Recently, the single-molecule sequencing techniques such as the direct RNA sequencing platform offered by Oxford Nanopore technologies enable direct detection of RNA modifications on the molecule that is being sequenced, but to our knowledge this technology has not been used to identify RNA Pseudouridine sites. To this end, in this paper, we address this limitation by introducing a tool called Penguin that integrates several developed machine learning (ML) models (i.e., predictors) to identify RNA Pseudouridine sites in Nanopore direct RNA sequencing reads. Penguin extracts a set of features from the raw signal measured by the Oxford Nanopore and the corresponding basecalled k-mer. Those features are used to train the predictors included in Penguin, which in turn, is able to predict whether the signal is modified by the presence of Pseudouridine sites. We have included various predictors in Penguin including Support vector machine (SVM), Random Forest (RF), and Neural network (NN). The results on the two benchmark data sets show that Penguin is able to identify Pseudouridine sites with a high accuracy of 93.38% and 92.61% using SVM in random split testing and independent validation testing respectively. Thus, Penguin outperforms the existing Pseudouridine predictors in the literature that achieved an accuracy of 76.0 at most with an independent validation testing. A GitHub of the tool is accessible at https://github.com/Janga-Lab/Penguin.


RNA ◽  
2021 ◽  
pp. rna.078937.121
Author(s):  
Felix Grünberger ◽  
Sébastien Ferreira-Cerca ◽  
Dina Grohmann

High-throughput sequencing dramatically changed our view of transcriptome architectures and allowed for ground-breaking discoveries in RNA biology. Recently, sequencing of full-length transcripts based on the single-molecule sequencing platform from Oxford Nanopore Technologies (ONT) was introduced and is widely employed to sequence eukaryotic and viral RNAs. However, experimental approaches implementing this technique for prokaryotic transcriptomes remain scarce. Here, we present an experimental and bioinformatic workflow for ONT RNA-seq in the bacterial model organism Escherichia coli, which can be applied to any microorganism. Our study highlights critical steps of library preparation and computational analysis and compares the results to gold standards in the field. Furthermore, we comprehensively evaluate the applicability and advantages of different ONT-based RNA sequencing protocols, including direct RNA, direct cDNA, and PCR-cDNA. We find that (PCR)-cDNA-seq offers improved yield and accuracy compared to direct RNA sequencing. Notably, (PCR)-cDNA-seq is suitable for quantitative measurements and can be readily used for simultaneous and accurate detection of transcript 5'and 3' boundaries, analysis of transcriptional units and transcriptional heterogeneity. In summary, based on our comprehensive study, we show that Nanopore RNA-seq to be a ready-to-use tool allowing rapid, cost-effective, and accurate annotation of multiple transcriptomic features. Thereby Nanopore RNA-seq holds the potential to become a valuable alternative method for RNA analysis in prokaryotes.


2021 ◽  
Author(s):  
Felix Gruenberger ◽  
Sebastien Ferreira-Cerca ◽  
Dina Grohmann

High-throughput sequencing dramatically changed our view of transcriptome architectures and allowed for ground-breaking discoveries in RNA biology. Recently, sequencing of full-length transcripts based on the single-molecule sequencing platform from Oxford Nanopore Technologies (ONT) was introduced and is widely employed to sequence eukaryotic and viral RNAs. However, experimental approaches implementing this technique for prokaryotic transcriptomes remain scarce. Here, we present an experimental and bioinformatic workflow for ONT RNA-seq in the bacterial model organism Escherichia coli, which can be applied to any microorganism. Our study highlights critical steps of library preparation and computational analysis and compares the results to gold standards in the field. Furthermore, we comprehensively evaluate the applicability and advantages of different ONT-based RNA sequencing protocols, including direct RNA, direct cDNA, and PCR-cDNA. We find that cDNA-seq offers improved yield and accuracy without bias in quantification compared to direct RNA sequencing. Notably, cDNA-seq can be readily used for simultaneous transcript quantification, accurate detection of transcript 5 ′ and 3′ boundaries, analysis of transcriptional units and transcriptional heterogeneity. In summary, we establish Nanopore RNA-seq to be a ready-to-use tool allowing rapid, cost-effective, and accurate annotation of multiple transcriptomic features thereby advancing it to become a standard method for RNA analysis in prokaryotes.


2019 ◽  
Author(s):  
Adrien Leger ◽  
Paulo P. Amaral ◽  
Luca Pandolfini ◽  
Charlotte Capitanchik ◽  
Federica Capraro ◽  
...  

AbstractRNA molecules undergo a vast array of chemical post-transcriptional modifications (PTMs) that can affect their structure and interaction properties. To date, over 150 naturally occurring PTMs have been identified, however the overwhelming majority of their functions remain elusive. In recent years, a small number of PTMs have been successfully mapped to the transcriptome using experimental approaches relying on high-throughput sequencing. Oxford Nanopore direct-RNA sequencing (DRS) technology has been shown to be sensitive to RNA modifications. We developed and validated Nanocompore, a robust analytical framework to evaluate the presence of modifications in DRS data. To do so, we compare an RNA sample of interest against a non-modified control sample. Our strategy does not require a training set and allows the use of replicates to model biological variability. Here, we demonstrate the ability of Nanocompore to detect RNA modifications at single-molecule resolution in human polyA+ RNAs, as well as in targeted non-coding RNAs. Our results correlate well with orthogonal methods, confirm previous observations on the distribution of N6-methyladenosine sites and provide novel insights into the distribution of RNA modifications in the coding and non-coding transcriptomes. The latest version of Nanocompore can be obtained at https://github.com/tleonardi/nanocompore.


2020 ◽  
Author(s):  
Soundhar Ramasamy ◽  
Vinodh J Sahayasheela ◽  
Zutao Yu ◽  
Takuya Hidaka ◽  
Li Cai ◽  
...  

ABSTRACTRNA modifications contribute to RNA and protein diversity in eukaryotes and lead to amino acid substitutions, deletions, and changes in gene expression levels. Several methods have developed to profile RNA modifications, however, a less laborious identification of inosine and pseudouridine modifications in the whole transcriptome is still not available. Herein, we address the first step of the above question by sequencing synthetic RNA constructs with inosine and pseudouridine modification using Oxford Nanopore Technology, which is a direct RNA sequencing platform for rapid detection of RNA modification in a relatively less labor-intensive manner. Our analysis of multiple nanopore parameters reveals mismatch error majorly distinguish unmodified versus modified nucleobase. Moreover, we have shown that acrylonitrile selective reactivity with inosine and pseudouridine generates a differential profile between the modified and treated construct. Our results offer a new methodology to harness selectively reactive chemical probe-based modification along with existing direct RNA sequencing methods to profile multiple RNA modifications on a single RNA.


Author(s):  
Huan Zhong ◽  
Zongwei Cai ◽  
Zhu Yang ◽  
Yiji Xia

AbstractNAD tagSeq has recently been developed for the identification and characterization of NAD+-capped RNAs (NAD-RNAs). This method adopts a strategy of chemo-enzymatic reactions to label the NAD-RNAs with a synthetic RNA tag before subjecting to the Oxford Nanopore direct RNA sequencing. A computational tool designed for analyzing the sequencing data of tagged RNA will facilitate the broader application of this method. Hence, we introduce TagSeqTools as a flexible, general pipeline for the identification and quantification of tagged RNAs (i.e., NAD+-capped RNAs) using long-read transcriptome sequencing data generated by NAD tagSeq method. TagSeqTools comprises two major modules, TagSeek for differentiating tagged and untagged reads, and TagSeqQuant for the quantitative and further characterization analysis of genes and isoforms. Besides, the pipeline also integrates some advanced functions to identify antisense or splicing, and supports the data reformation for visualization. Therefore, TagSeqTools provides a convenient and comprehensive workflow for researchers to analyze the data produced by the NAD tagSeq method or other tagging-based experiments using Oxford nanopore direct RNA sequencing. The pipeline is available at https://github.com/dorothyzh/TagSeqTools, under Apache License 2.0.


eLife ◽  
2020 ◽  
Vol 9 ◽  
Author(s):  
Matthew T Parker ◽  
Katarzyna Knop ◽  
Anna V Sherwood ◽  
Nicholas J Schurch ◽  
Katarzyna Mackinnon ◽  
...  

Understanding genome organization and gene regulation requires insight into RNA transcription, processing and modification. We adapted nanopore direct RNA sequencing to examine RNA from a wild-type accession of the model plant Arabidopsis thaliana and a mutant defective in mRNA methylation (m6A). Here we show that m6A can be mapped in full-length mRNAs transcriptome-wide and reveal the combinatorial diversity of cap-associated transcription start sites, splicing events, poly(A) site choice and poly(A) tail length. Loss of m6A from 3’ untranslated regions is associated with decreased relative transcript abundance and defective RNA 3′ end formation. A functional consequence of disrupted m6A is a lengthening of the circadian period. We conclude that nanopore direct RNA sequencing can reveal the complexity of mRNA processing and modification in full-length single molecule reads. These findings can refine Arabidopsis genome annotation. Further, applying this approach to less well-studied species could transform our understanding of what their genomes encode.


Cells ◽  
2020 ◽  
Vol 9 (8) ◽  
pp. 1776
Author(s):  
Mourdas Mohamed ◽  
Nguyet Thi-Minh Dang ◽  
Yuki Ogyama ◽  
Nelly Burlet ◽  
Bruno Mugat ◽  
...  

Transposable elements (TEs) are the main components of genomes. However, due to their repetitive nature, they are very difficult to study using data obtained with short-read sequencing technologies. Here, we describe an efficient pipeline to accurately recover TE insertion (TEI) sites and sequences from long reads obtained by Oxford Nanopore Technology (ONT) sequencing. With this pipeline, we could precisely describe the landscapes of the most recent TEIs in wild-type strains of Drosophila melanogaster and Drosophila simulans. Their comparison suggests that this subset of TE sequences is more similar than previously thought in these two species. The chromosome assemblies obtained using this pipeline also allowed recovering piRNA cluster sequences, which was impossible using short-read sequencing. Finally, we used our pipeline to analyze ONT sequencing data from a D. melanogaster unstable line in which LTR transposition was derepressed for 73 successive generations. We could rely on single reads to identify new insertions with intact target site duplications. Moreover, the detailed analysis of TEIs in the wild-type strains and the unstable line did not support the trap model claiming that piRNA clusters are hotspots of TE insertions.


2020 ◽  
Vol 12 (574) ◽  
pp. eabe4282 ◽  
Author(s):  
Ankit Bharat ◽  
Melissa Querrey ◽  
Nikolay S. Markov ◽  
Samuel Kim ◽  
Chitaru Kurihara ◽  
...  

Lung transplantation can potentially be a life-saving treatment for patients with nonresolving COVID-19–associated respiratory failure. Concerns limiting lung transplantation include recurrence of SARS-CoV-2 infection in the allograft, technical challenges imposed by viral-mediated injury to the native lung, and the potential risk for allograft infection by pathogens causing ventilator-associated pneumonia in the native lung. Additionally, the native lung might recover, resulting in long-term outcomes preferable to those of transplant. Here, we report the results of lung transplantation in three patients with nonresolving COVID-19–associated respiratory failure. We performed single-molecule fluorescence in situ hybridization (smFISH) to detect both positive and negative strands of SARS-CoV-2 RNA in explanted lung tissue from the three patients and in additional control lung tissue samples. We conducted extracellular matrix imaging and single-cell RNA sequencing on explanted lung tissue from the three patients who underwent transplantation and on warm postmortem lung biopsies from two patients who had died from COVID-19–associated pneumonia. Lungs from these five patients with prolonged COVID-19 disease were free of SARS-CoV-2 as detected by smFISH, but pathology showed extensive evidence of injury and fibrosis that resembled end-stage pulmonary fibrosis. Using machine learning, we compared single-cell RNA sequencing data from the lungs of patients with late-stage COVID-19 to that from the lungs of patients with pulmonary fibrosis and identified similarities in gene expression across cell lineages. Our findings suggest that some patients with severe COVID-19 develop fibrotic lung disease for which lung transplantation is their only option for survival.


2019 ◽  
Vol 9 (1) ◽  
Author(s):  
Madiha Sultan ◽  
Anastassia Kanavarioti

Abstract Protein and solid-state nanopores are used for DNA/RNA sequencing as well as for single molecule analysis. We proposed that selective labeling/tagging may improve base-to-base resolution of nucleic acids via nanopores. We have explored one specific tag, the Osmium tetroxide 2,2′-bipyridine (OsBp), which conjugates to pyrimidines and leaves purines intact. Earlier reports using OsBp-tagged oligodeoxyribonucleotides demonstrated proof-of-principle during unassisted voltage-driven translocation via either alpha-Hemolysin or a solid-state nanopore. Here we extend this work to RNA oligos and a third nanopore by employing the MinION, a commercially available device from Oxford Nanopore Technologies (ONT). Conductance measurements demonstrate that the MinION visibly discriminates oligoriboadenylates with sequence A15PyA15, where Py is an OsBp-tagged pyrimidine. Such resolution rivals traditional chromatography, suggesting that nanopore devices could be exploited for the characterization of RNA oligos and microRNAs enhanced by selective labeling. The data also reveal marked discrimination between a single pyrimidine and two consecutive pyrimidines in OsBp-tagged AnPyAn and AnPyPyAn. This observation leads to the conjecture that the MinION/OsBp platform senses a 2-nucleotide sequence, in contrast to the reported 5-nucleotide sequence with native nucleic acids. Such improvement in sensing, enabled by the presence of OsBp, may enhance base-calling accuracy in enzyme-assisted DNA/RNA sequencing.


Sign in / Sign up

Export Citation Format

Share Document