Using Mixtures of Biological Samples as Process Controls for RNA-sequencing experiments

Mapping Intimacies ◽

10.1101/015107 ◽

2015 ◽

Author(s):

Jerod Parsons ◽

Sarah Munro ◽

P. Scott Pine ◽

Jennifer McDaniel ◽

Michele Mehaffey ◽

...

Keyword(s):

Linear Model ◽

Rna Sequencing ◽

Messenger Rna ◽

Complex Mixture ◽

Rna Seq ◽

Total Rna ◽

Performance Benchmarking ◽

Mrna Content ◽

Genome Scale ◽

Process Controls

Genome-scale ?-omics? measurements are challenging to benchmark due to the enormous variety of unique biological molecules involved. Mixtures of previously-characterized samples can be used to benchmark repeatability and reproducibility using component proportions as truth for the measurement. We describe and evaluate experiments characterizing the performance of RNA-sequencing (RNA-Seq) measurements, and discuss cases where mixtures can serve as effective process controls. We apply a linear model to total RNA mixture samples in RNA-seq experiments. This model provides a context for performance benchmarking. The parameters of the model fit to experimental results can be evaluated to assess bias and variability of the measurement of a mixture. A linear model describes the behavior of mixture expression measures and provides a context for performance benchmarking. Residuals from fitting the model to experimental data can be used as a metric for evaluating the effect that an individual step in an experimental process has on the linear response function and precision of the underlying measurement while identifying signals affected by interference from other sources. Effective benchmarking requires well-defined mixtures, which for RNA-Seq requires knowledge of the messenger RNA (mRNA) content of the individual total RNA components. We demonstrate and evaluate an experimental method suitable for use in genome-scale process control and lay out a method utilizing spike-in controls to determine mRNA content of total RNA in samples. Genome-scale process controls can be derived from mixtures. These controls relate prior knowledge of individual components to a complex mixture, allowing assessment of measurement performance. The mRNA fraction accounts for differential enrichment of mRNA from varying total RNA samples. Spike-in controls can be utilized to measure this relationship between mRNA content and input total RNA. Our mixture analysis method also enables estimation of the proportions of an unknown mixture, even when component-specific markers are not previously known, whenever pure components are measured alongside the mixture.

Download Full-text

Performance assessment of total RNA sequencing of human biofluids and extracellular vesicles

10.1101/701524 ◽

2019 ◽

Author(s):

Celine Everaert ◽

Hetty Helsmoortel ◽

Anneleen Decock ◽

Eva Hulstaert ◽

Ruben Van Paemel ◽

...

Keyword(s):

Rna Sequencing ◽

Extracellular Vesicles ◽

Platelet Rich Plasma ◽

Rna Seq ◽

Total Rna ◽

Rna Molecules ◽

Rna Profiling ◽

Wide Range ◽

Read Distribution ◽

Free Plasma

AbstractRNA profiling has emerged as a powerful tool to investigate the biomarker potential of human biofluids. However, despite enormous interest in extracellular nucleic acids, RNA sequencing methods to quantify the total RNA content outside cells are rare. Here, we evaluate the performance of the SMARTer Stranded Total RNA-Seq method in human platelet-rich plasma, platelet-free plasma, urine, conditioned medium, and extracellular vesicles (EVs) from these biofluids. We found the method to be accurate, precise, compatible with low-input volumes and able to quantify a few thousand genes. We picked up distinct classes of RNA molecules, including mRNA, lncRNA, circRNA, miscRNA and pseudogenes. Notably, the read distribution and gene content drastically differ among biofluids. In conclusion, we are the first to show that the SMARTer method can be used for unbiased unraveling of the complete transcriptome of a wide range of biofluids and their extracellular vesicles.

Download Full-text

RNA Sequencing in Schizophrenia

Bioinformatics and Biology Insights ◽

10.4137/bbi.s28992 ◽

2015 ◽

Vol 9s1 ◽

pp. BBI.S28992

Author(s):

Xin Li ◽

Shaolei Teng

Keyword(s):

Rna Sequencing ◽

Global Gene Expression ◽

The Novel ◽

Rna Seq ◽

Gene Expressions ◽

Splice Isoforms ◽

Heavy Burden ◽

Induced Pluripotent ◽

Genome Scale ◽

Generation Sequencing

Schizophrenia (SCZ) is a serious psychiatric disorder that affects 1% of general population and places a heavy burden worldwide. The underlying genetic mechanism of SCZ remains unknown, but studies indicate that the disease is associated with a global gene expression disturbance across many genes. Next-generation sequencing, particularly of RNA sequencing (RNA-Seq), provides a powerful genome-scale technology to investigate the pathological processes of SCZ. RNA-Seq has been used to analyze the gene expressions and identify the novel splice isoforms and rare transcripts associated with SCZ. This paper provides an overview on the genetics of SCZ, the advantages of RNA-Seq for transcriptome analysis, the accomplishments of RNA-Seq in SCZ cohorts, and the applications of induced pluripotent stem cells and RNA-Seq in SCZ research.

Download Full-text

Comparing total RNA sequencing and metagenomics pipelines for multi-domain taxonomic profiling: implications for ecological assessments

ARPHA Conference Abstracts ◽

10.3897/aca.4.e64996 ◽

2021 ◽

Vol 4 ◽

Author(s):

Christopher Hempel ◽

Julia Harvie ◽

Jose Hleap Lozano ◽

Natalie Wright ◽

Sarah Adamowicz ◽

...

Keyword(s):

Microbial Communities ◽

Rna Sequencing ◽

Sample Type ◽

Rna Seq ◽

Total Rna ◽

Taxonomic Profiling ◽

Dna And Rna ◽

Accuracy And Precision ◽

The Status ◽

Unicellular Organisms

Ecological assessments are necessary to evaluate the status of our deteriorating ecosystems, however, assessment methods traditionally omit most microbes because unicellular organisms are challenging to identify. This omission is not ideal, as microbes might be better indicators for changes in environmental conditions than taxa traditionally used. DNA- and RNA-based techniques are increasingly applied for ecological assessments to overcome this challenge but require more testing and optimization. In this study, we compare metagenomics and total RNA sequencing (total RNA-Seq) for their taxonomic profiling performance for microbial communities. We applied both techniques on two sample sets, 1) a commercially available microbial mock community consisting of eight bacterial and two eukaryotic species, and 2) a display tank water sample. We processed the data using 1,532 bioinformatics pipelines and evaluated each workflow, i.e., the combination of sample type (metagenomics or total RNA-Seq) and pipeline, in terms of their accuracy and precision. This talk will showcase preliminary results and highlight differences in workflow performances. A recommended workflow to maximize taxonomic profiling accuracy of microbial communities will also be presented.

Download Full-text

SMARTer single cell total RNA sequencing

10.1101/430090 ◽

2018 ◽

Cited By ~ 1

Author(s):

Verboom Karen ◽

Everaert Celine ◽

Bolduc Nathalie ◽

Livak J. Kenneth ◽

Yigit Nurten ◽

...

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Expression Patterns ◽

Transcript Level ◽

Cellular Heterogeneity ◽

Circular Rnas ◽

Rna Seq ◽

Sequencing Data ◽

Total Rna ◽

Sequencing Experiment

AbstractSingle cell RNA sequencing methods have been increasingly used to understand cellular heterogeneity. Nevertheless, most of these methods suffer from one or more limitations, such as focusing only on polyadenylated RNA, sequencing of only the 3’ end of the transcript, an exuberant fraction of reads mapping to ribosomal RNA, and the unstranded nature of the sequencing data. Here, we developed a novel single cell strand-specific total RNA library preparation method addressing all the aforementioned shortcomings. Our method was validated on a microfluidics system using three different cancer cell lines undergoing a chemical or genetic perturbation. We demonstrate that our total RNA-seq method detects an equal or higher number of genes compared to classic polyA[+] RNA-seq, including novel and non-polyadenylated genes. The obtained RNA expression patterns also recapitulate the expected biological signal. Inherent to total RNA-seq, our method is also able to detect circular RNAs. Taken together, SMARTer single cell total RNA sequencing is very well suited for any single cell sequencing experiment in which transcript level information is needed beyond polyadenylated genes.

Download Full-text

Performance assessment of total RNA sequencing of human biofluids and extracellular vesicles

Scientific Reports ◽

10.1038/s41598-019-53892-x ◽

2019 ◽

Vol 9 (1) ◽

Cited By ~ 11

Author(s):

Celine Everaert ◽

Hetty Helsmoortel ◽

Anneleen Decock ◽

Eva Hulstaert ◽

Ruben Van Paemel ◽

...

Keyword(s):

Rna Sequencing ◽

Extracellular Vesicles ◽

Platelet Rich Plasma ◽

Rna Seq ◽

Total Rna ◽

Rna Molecules ◽

Rna Profiling ◽

Wide Range ◽

Read Distribution ◽

Free Plasma

Download Full-text

A simple, cost-effective, and robust method for rRNA depletion in RNA-sequencing studies

10.1101/2020.01.06.896837 ◽

2020 ◽

Cited By ~ 4

Author(s):

Peter H. Culviner ◽

Chantal K. Guegler ◽

Michael T. Laub

Keyword(s):

Gene Expression ◽

Rna Sequencing ◽

Ribosomal Rna ◽

Bacterial Species ◽

Cost Effective ◽

Rna Seq ◽

Robust Method ◽

Total Rna ◽

Metagenomic Sample ◽

Rrna Depletion

AbstractThe profiling of gene expression by RNA-sequencing (RNA-seq) has enabled powerful studies of global transcriptional patterns in all organisms, including bacteria. Because the vast majority of RNA in bacteria is ribosomal RNA (rRNA), it is standard practice to deplete the rRNA from a total RNA sample such that the reads in an RNA-seq experiment derive predominantly from mRNA. One of the most commonly used commercial kits for rRNA depletion, the Ribo-Zero kit from Illumina, was recently discontinued. Here, we report the development a simple, cost-effective, and robust method for depleting rRNA that can be easily implemented by any lab or facility. We first developed an algorithm for designing biotinylated oligonucleotides that will hybridize tightly and specifically to the 23S, 16S, and 5S rRNAs from any species of interest. Precipitation of these oligonucleotides bound to rRNA by magnetic streptavidin beads then depletes rRNA from a complex, total RNA sample such that ~75-80% of reads in a typical RNA-seq experiment derive from mRNA. Importantly, we demonstrate a high correlation of RNA abundance or fold-change measurements in RNA-seq experiments between our method and the previously available Ribo-Zero kit. Complete details on the methodology are provided, including open-source software for designing oligonucleotides optimized for any bacterial species or metagenomic sample of interest.ImportanceThe ability to examine global patterns of gene expression in microbes through RNA-sequencing has fundamentally transformed microbiology. However, RNA-seq depends critically on the removal of ribosomal RNA from total RNA samples. Otherwise, rRNA would comprise upwards of 90% of the reads in a typical RNA-seq experiment, limiting the reads coming from messenger RNA or requiring high total read depth. A commonly used, kit for rRNA subtraction from Illumina was recently discontinued. Here, we report the development of a ‘do-it-yourself’ kit for rapid, cost-effective, and robust depletion of rRNA from total RNA. We present an algorithm for designing biotinylated oligonucleotides that will hybridize to the rRNAs from a target set of species. We then demonstrate that the designed oligos enable sufficient rRNA depletion to produce RNA-seq data with 75-80% of reads comming from mRNA. The methodology presented should enable RNA-seq studies on any species or metagenomic sample of interest.

Download Full-text

Fusion transcripts and their genomic breakpoints in polyadenylated and ribosomal RNA–minus RNA sequencing data

GigaScience ◽

10.1093/gigascience/giab080 ◽

2021 ◽

Vol 10 (12) ◽

Cited By ~ 1

Author(s):

Youri Hoogstrate ◽

Malgorzata A Komor ◽

René Böttcher ◽

Job van Riet ◽

Harmen J G van de Werken ◽

...

Keyword(s):

Rna Sequencing ◽

Ribosomal Rna ◽

Messenger Rna ◽

Search Space ◽

Whole Genome Sequencing Data ◽

Full Potential ◽

Rna Seq ◽

Sequencing Data ◽

Fusion Transcripts ◽

Intergenic Regions

Abstract Background Fusion genes are typically identified by RNA sequencing (RNA-seq) without elucidating the causal genomic breakpoints. However, non–poly(A)-enriched RNA-seq contains large proportions of intronic reads that also span genomic breakpoints. Results We have developed an algorithm, Dr. Disco, that searches for fusion transcripts by taking an entire reference genome into account as search space. This includes exons but also introns, intergenic regions, and sequences that do not meet splice junction motifs. Using 1,275 RNA-seq samples, we investigated to what extent genomic breakpoints can be extracted from RNA-seq data and their implications regarding poly(A)-enriched and ribosomal RNA–minus RNA-seq data. Comparison with whole-genome sequencing data revealed that most genomic breakpoints are not, or minimally, transcribed while, in contrast, the genomic breakpoints of all 32 TMPRSS2-ERG–positive tumours were present at RNA level. We also revealed tumours in which the ERG breakpoint was located before ERG, which co-existed with additional deletions and messenger RNA that incorporated intergenic cryptic exons. In breast cancer we identified rearrangement hot spots near CCND1 and in glioma near CDK4 and MDM2 and could directly associate this with increased expression. Furthermore, in all datasets we find fusions to intergenic regions, often spanning multiple cryptic exons that potentially encode neo-antigens. Thus, fusion transcripts other than classical gene-to-gene fusions are prominently present and can be identified using RNA-seq. Conclusion By using the full potential of non–poly(A)-enriched RNA-seq data, sophisticated analysis can reliably identify expressed genomic breakpoints and their transcriptional effects.

Download Full-text

RNA-seq transcript quantification from reduced-representation data in recount2

10.1101/247346 ◽

2018 ◽

Cited By ~ 1

Author(s):

Jack M. Fu ◽

Kai Kammers ◽

Abhinav Nellore ◽

Leonardo Collado-Torres ◽

Jeffrey T. Leek ◽

...

Keyword(s):

Linear Model ◽

Rna Sequencing ◽

Confidence Intervals ◽

Real Data ◽

Transcript Level ◽

Rna Seq ◽

Short Read ◽

Reduced Representation ◽

Transcript Quantification

AbstractMore than 70,000 short-read RNA-sequencing samples are publicly available through the recount2 project, a curated database of summary coverage data. However, no current methods can be directly applied to the reduced-representation information stored in this database to estimate transcript-level abundances. Here we present a linear model taking as input summary coverage of junctions and subdivided exons to output estimated abundances and associated uncertainty. We evaluate the performance of our model on simulated and real data, and provide a procedure to construct confidence intervals for estimates.

Download Full-text

SMARTer single cell total RNA sequencing

Nucleic Acids Research ◽

10.1093/nar/gkz535 ◽

2019 ◽

Vol 47 (16) ◽

pp. e93-e93 ◽

Cited By ~ 13

Author(s):

Karen Verboom ◽

Celine Everaert ◽

Nathalie Bolduc ◽

Kenneth J Livak ◽

Nurten Yigit ◽

...

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Cell Lines ◽

Cancer Cell ◽

Expression Patterns ◽

Cancer Cell Lines ◽

Cellular Heterogeneity ◽

Circular Rnas ◽

Rna Seq ◽

Total Rna

Abstract Single cell RNA sequencing methods have been increasingly used to understand cellular heterogeneity. Nevertheless, most of these methods suffer from one or more limitations, such as focusing only on polyadenylated RNA, sequencing of only the 3′ end of the transcript, an exuberant fraction of reads mapping to ribosomal RNA, and the unstranded nature of the sequencing data. Here, we developed a novel single cell strand-specific total RNA library preparation method addressing all the aforementioned shortcomings. Our method was validated on a microfluidics system using three different cancer cell lines undergoing a chemical or genetic perturbation and on two other cancer cell lines sorted in microplates. We demonstrate that our total RNA-seq method detects an equal or higher number of genes compared to classic polyA[+] RNA-seq, including novel and non-polyadenylated genes. The obtained RNA expression patterns also recapitulate the expected biological signal. Inherent to total RNA-seq, our method is also able to detect circular RNAs. Taken together, SMARTer single cell total RNA sequencing is very well suited for any single cell sequencing experiment in which transcript level information is needed beyond polyadenylated genes.

Download Full-text

MGcount: a total RNA-seq quantification tool to address multi-mapping and multi-overlapping alignments ambiguity in non-coding transcripts

BMC Bioinformatics ◽

10.1186/s12859-021-04544-3 ◽

2022 ◽

Vol 23 (1) ◽

Author(s):

Andrea Hita ◽

Gilles Brocart ◽

Ana Fernandez ◽

Marc Rehmsmeier ◽

Anna Alemany ◽

...

Keyword(s):

Rna Sequencing ◽

Genomic Region ◽

Simultaneous Estimation ◽

Rna Seq ◽

Protein Coding ◽

Total Rna ◽

Simultaneous Study ◽

Downstream Analysis ◽

And Function ◽

Genomic Locations

Abstract Background Total-RNA sequencing (total-RNA-seq) allows the simultaneous study of both the coding and the non-coding transcriptome. Yet, computational pipelines have traditionally focused on particular biotypes, making assumptions that are not fullfilled by total-RNA-seq datasets. Transcripts from distinct RNA biotypes vary in length, biogenesis, and function, can overlap in a genomic region, and may be present in the genome with a high copy number. Consequently, reads from total-RNA-seq libraries may cause ambiguous genomic alignments, demanding for flexible quantification approaches. Results Here we present Multi-Graph count (MGcount), a total-RNA-seq quantification tool combining two strategies for handling ambiguous alignments. First, MGcount assigns reads hierarchically to small-RNA and long-RNA features to account for length disparity when transcripts overlap in the same genomic position. Next, MGcount aggregates RNA products with similar sequences where reads systematically multi-map using a graph-based approach. MGcount outputs a transcriptomic count matrix compatible with RNA-sequencing downstream analysis pipelines, with both bulk and single-cell resolution, and the graphs that model repeated transcript structures for different biotypes. The software can be used as a python module or as a single-file executable program. Conclusions MGcount is a flexible total-RNA-seq quantification tool that successfully integrates reads that align to multiple genomic locations or that overlap with multiple gene features. Its approach is suitable for the simultaneous estimation of protein-coding, long non-coding and small non-coding transcript concentration, in both precursor and processed forms. Both source code and compiled software are available at https://github.com/hitaandrea/MGcount.

Download Full-text