scholarly journals tailfindr: Alignment-free poly(A) length measurement for Oxford Nanopore RNA and DNA sequencing

2019 ◽  
Author(s):  
Maximilian Krause ◽  
Adnan M. Niazi ◽  
Kornel Labun ◽  
Yamila N. Torres Cleuren ◽  
Florian S. Müller ◽  
...  

Polyadenylation at the 3’-end is a major regulator of messenger RNA and its length is known to affect nuclear export, stability and translation, among others. Only recently, strategies have emerged that allow for genome-wide poly(A) length assessment. These methods identify genes connected to poly(A) tail measurements indirectly by short-read alignment to genetic 3’-ends. Concurrently Oxford Nanopore Technologies (ONT) established full-length isoform RNA sequencing containing the entire poly(A) tail. However, assessing poly(A) length through basecalling has so far not been possible due the inability to resolve long homopolymeric stretches in ONT sequencing.Here we presenttailfindr, an R package to estimate poly(A) tail length on ONT long-read sequencing data.tailfindroperates on unaligned, basecalled data. It measures poly(A) tail length from both native RNA and DNA sequencing, which makes poly(A) tail studies by full-length cDNA approaches possible for the first time. We assesstailfindr’sperformance across different poly(A) lengths, demonstrating thattailfindris a versatile tool providing poly(A) tail estimates across a wide range of sequencing conditions.

2021 ◽  
Author(s):  
Rory James Munro ◽  
Nadine Holmes ◽  
Christopher Moore ◽  
Matthew Carlile ◽  
Alex Payne ◽  
...  

Motivation: The ongoing SARS-CoV-2 pandemic has demonstrated the utility of real-time analysis of sequencing data, with a wide range of databases and resources for analysis now available. Here we show how the real-time nature of Oxford Nanopore Technologies sequencers can accelerate consensus generation, lineage and variant status assignment. We exploit the fact that multiplexed viral sequencing libraries quickly generate sufficient data for the majority of samples, with diminishing returns on remaining samples as the sequencing run progresses. We demonstrate methods to determine when a sequencing run has passed this point in order to reduce the time required and cost of sequencing. Results: We extended MinoTour, our real-time analysis and monitoring platform for nanopore sequencers, to provide SARS-CoV2 analysis using ARTIC network pipelines. We additionally developed an algorithm to predict which samples will achieve sufficient coverage, automatically running the ARTIC medaka informatics pipeline once specific coverage thresholds have been reached on these samples. After testing on run data, we find significant run time savings are possible, enabling flow cells to be used more efficiently and enabling higher throughput data analysis. The resultant consensus genomes are assigned both PANGO lineage and variant status as defined by Public Health England. Samples from within individual runs are used to generate phylogenetic trees incorporating optional background samples as well as summaries of individual SNPs. As minoTour uses ARTIC pipelines, new primer schemes and pathogens can be added to allow minoTour to aid in real-time analysis of pathogens in the future.


2019 ◽  
Author(s):  
Yuanxin Wang ◽  
Ruiping Wang ◽  
Shaojun Zhang ◽  
Shumei Song ◽  
Changying Jiang ◽  
...  

ABSTRACTCrosstalk between tumor cells and other cells within the tumor microenvironment (TME) plays a crucial role in tumor progression, metastases, and therapy resistance. We present iTALK, a computational approach to characterize and illustrate intercellular communication signals in the multicellular tumor ecosystem using single-cell RNA sequencing data. iTALK can in principle be used to dissect the complexity, diversity, and dynamics of cell-cell communication from a wide range of cellular processes.


2019 ◽  
Author(s):  
Luca Cozzuto ◽  
Huanle Liu ◽  
Leszek P. Pryszcz ◽  
Toni Hermoso Pulido ◽  
Julia Ponomarenko ◽  
...  

ABSTRACTThe direct RNA sequencing platform offered by Oxford Nanopore Technologies allows for direct measurement of RNA molecules without the need of conversion to complementary DNA, fragmentation or amplification. As such, it is virtually capable of detecting any given RNA modification present in the molecule that is being sequenced, as well as provide polyA tail length estimations at the level of individual RNA molecules. Although this technology has been publicly available since 2017, the complexity of the raw Nanopore data, together with the lack of systematic and reproducible pipelines, have greatly hindered the access of this technology to the general user. Here we address this problem by providing a fully benchmarked workflow for the analysis of direct RNA sequencing reads, termed MasterOfPores. The pipeline converts raw current intensities into multiple types of processed data, providing metrics of the quality of the run, quality-filtering, base-calling and mapping. The output of the pipeline can in turn be used to compute per-gene counts, RNA modifications, and prediction of polyA tail length and RNA isoforms. The software is written using the NextFlow framework for parallelization and portability, and relies on Linux containers such as Docker and Singularity for achieving better reproducibility. The MasterOfPores workflow can be executed on any Unix-compatible OS on a computer, cluster or cloud without the need of installing any additional software or dependencies, and is freely available in Github (https://github.com/biocorecrg/master_of_pores). This workflow will significantly simplify the analysis of nanopore direct RNA sequencing data by non-bioinformatics experts, thus boosting the understanding of the (epi)transcriptome with single molecule resolution.


2019 ◽  
Author(s):  
Davide Bolognini ◽  
Niccolò Bartalucci ◽  
Alessandra Mingrino ◽  
Alessandro Maria Vannucchi ◽  
Alberto Magi

AbstractMinION and GridION X5 from Oxford Nanopore Technologies are devices for real-time DNA and RNA sequencing. On the one hand, MinION is the only real-time, low cost and portable sequencing device and, thanks to its unique properties, is becoming more and more popular among biologists; on the other, GridION X5, mainly for its costs, is less widespread but highly suitable for researchers with large sequencing projects. Despite the fact that Oxford Nanopore Technologies’ devices have been increasingly used in the last few years, there is a lack of high-performing and user-friendly tools to handle the data outputted by both MinION and GridION X5 platforms. Here we present NanoR, a cross-platform R package designed with the purpose to simplify and improve nanopore data visualization. Indeed, NanoR is built on few functions but overcomes the capabilities of existing tools to extract meaningful informations from MinION sequencing data; in addition, as exclusive features, NanoR can deal with GridION X5 sequencing outputs and allows comparison of both MinION and GridION X5 sequencing data in one command. NanoR is released as free package for R at https://github.com/davidebolo1993/NanoR.


2020 ◽  
Author(s):  
Steffen Klasberg ◽  
Alexander H. Schmidt ◽  
Vinzenz Lange ◽  
Gerhard Schöfl

AbstractBackgroundHigh resolution HLA genotyping of donors and recipients is a crucially important prerequisite for haematopoetic stem-cell transplantation and relies heavily on the quality and completeness of immuno-genetic reference sequence databases of allelic variation.ResultsHere, we report on DR2S, an R package that leverages the strengths of two sequencing technologies – the accuracy of next-generation sequencing with the read length of third-generation sequencing technologies like PacBio’s SMRT sequencing or ONT’s nanopore sequencing – to reconstruct fully-phased high-quality full-length haplotype sequences. Although optimised for HLA and KIR genes, DR2S is applicable to all loci with known reference sequences provided that full-length sequencing data is available for analysis. In addition, DR2S integrates supporting tools for easy visualisation and quality control of the reconstructed haplotype to ensure suitability for submission to public allele databases.ConclusionsDR2S is a largely automated workflow designed to create high-quality fully-phased reference allele sequences for highly polymorphic gene regions such as HLA or KIR. It has been used by biologists to successfully characterise and submit more than 500 HLA alleles and more than 500 KIR alleles to the IPD-IMGT/HLA and IPD-KIR databases.


BMC Genomics ◽  
2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Dawoon Chung ◽  
Yong Min Kwon ◽  
Youngik Yang

Abstract Background Trichoderma is a genus of fungi in the family Hypocreaceae and includes species known to produce enzymes with commercial use. They are largely found in soil and terrestrial plants. Recently, Trichoderma simmonsii isolated from decaying bark and decorticated wood was newly identified in the Harzianum clade of Trichoderma. Due to a wide range of applications in agriculture and other industries, genomes of at least 12 Trichoderma spp. have been studied. Moreover, antifungal and enzymatic activities have been extensively characterized in Trichoderma spp. However, the genomic information and bioactivities of T. simmonsii from a particular marine-derived isolate remain largely unknown. While we screened for asparaginase-producing fungi, we observed that T. simmonsii GH-Sj1 strain isolated from edible kelp produced asparaginase. In this study, we report a draft genome of T. simmonsii GH-Sj1 using Illumina and Oxford Nanopore technologies. Furthermore, to facilitate biotechnological applications of this species, RNA-sequencing was performed to elucidate the transcriptional profile of T. simmonsii GH-Sj1 in response to asparaginase-rich conditions. Results We generated ~ 14 Gb of sequencing data assembled in a ~ 40 Mb genome. The T. simmonsii GH-Sj1 genome consisted of seven telomere-to-telomere scaffolds with no sequencing gaps, where the N50 length was 6.4 Mb. The total number of protein-coding genes was 13,120, constituting ~ 99% of the genome. The genome harbored 176 tRNAs, which encode a full set of 20 amino acids. In addition, it had an rRNA repeat region consisting of seven repeats of the 18S-ITS1–5.8S-ITS2–26S cluster. The T. simmonsii genome also harbored 7 putative asparaginase-encoding genes with potential medical applications. Using RNA-sequencing analysis, we found that 3 genes among the 7 putative genes were significantly upregulated under asparaginase-rich conditions. Conclusions The genome and transcriptome of T. simmonsii GH-Sj1 established in the current work represent valuable resources for future comparative studies on fungal genomes and asparaginase production.


2019 ◽  
Vol 20 (1) ◽  
Author(s):  
Kieran R. Campbell ◽  
Adi Steif ◽  
Emma Laks ◽  
Hans Zahn ◽  
Daniel Lai ◽  
...  

RNA ◽  
2019 ◽  
Vol 25 (10) ◽  
pp. 1229-1241 ◽  
Author(s):  
Maximilian Krause ◽  
Adnan M. Niazi ◽  
Kornel Labun ◽  
Yamila N. Torres Cleuren ◽  
Florian S. Müller ◽  
...  

2021 ◽  
Author(s):  
Kimberley J J Billingsley ◽  
Ramita Dewan ◽  
Laksh Malik ◽  
Pilar Alvarez Jerez ◽  
Stith Kiley ◽  
...  

Processing human frontal cortex brain tissue for population-scale Oxford Nanopore long-read DNA sequencing SOP At the NIH's Center for Alzheimer's and Related Dementias (CARD) https://card.nih.gov/research-programs/long-read-sequencing we will generate long-read sequencing data from roughly 4000 patients with Alzheimer's disease, frontotemporal dementia, Lewy body dementia, and healthy subjects. With this research, we will build a public resource consisting of long-read genome sequencing data from a large number of confirmed people with Alzheimer's disease and related dementias and healthy individuals. To generate this large-scale nanopore sequencing data we have developed a protocol for processing and long-read sequencing human frontal cortex brain tissue, targeting an N50 of ~30kb and ~30X coverage. †Correspondence to: Kimberley Billingsley [email protected] and Cornelis Blauwendraat [email protected] Acknowledgements: We would like to thank the Nanopore team (Androo Markham &Hannah Lucio), Circulomics Inc team (Jeffrey Burke, Michelle Kim, Duncan Kilburn & Kelvin Liu) and the whole CARD long-read team listed below => UCSC: Benedict Paten, Mikhail Kolmogorov, Miten Jain, Kishwar Shafin, Trevor Pesout; NHGRI: Adam Phillippy, Arang Rhie; Baylor: Fritz Sedlazeck; JHU: Winston Timp; NINDS: Sonja Scholz; NIA: Cornelis Blauwendraat, Kimberley Billingsley, Frank Grenn, Pilar Alvarez Jerez, Bryan Traynor, Shannon Ballard, Caroline Pantazis; CZI: Paolo Carnevali.


Sign in / Sign up

Export Citation Format

Share Document