scholarly journals Using Mixtures of Biological Samples as Process Controls for RNA-sequencing experiments

2015 ◽  
Author(s):  
Jerod Parsons ◽  
Sarah Munro ◽  
P. Scott Pine ◽  
Jennifer McDaniel ◽  
Michele Mehaffey ◽  
...  

Genome-scale ?-omics? measurements are challenging to benchmark due to the enormous variety of unique biological molecules involved. Mixtures of previously-characterized samples can be used to benchmark repeatability and reproducibility using component proportions as truth for the measurement. We describe and evaluate experiments characterizing the performance of RNA-sequencing (RNA-Seq) measurements, and discuss cases where mixtures can serve as effective process controls. We apply a linear model to total RNA mixture samples in RNA-seq experiments. This model provides a context for performance benchmarking. The parameters of the model fit to experimental results can be evaluated to assess bias and variability of the measurement of a mixture. A linear model describes the behavior of mixture expression measures and provides a context for performance benchmarking. Residuals from fitting the model to experimental data can be used as a metric for evaluating the effect that an individual step in an experimental process has on the linear response function and precision of the underlying measurement while identifying signals affected by interference from other sources. Effective benchmarking requires well-defined mixtures, which for RNA-Seq requires knowledge of the messenger RNA (mRNA) content of the individual total RNA components. We demonstrate and evaluate an experimental method suitable for use in genome-scale process control and lay out a method utilizing spike-in controls to determine mRNA content of total RNA in samples. Genome-scale process controls can be derived from mixtures. These controls relate prior knowledge of individual components to a complex mixture, allowing assessment of measurement performance. The mRNA fraction accounts for differential enrichment of mRNA from varying total RNA samples. Spike-in controls can be utilized to measure this relationship between mRNA content and input total RNA. Our mixture analysis method also enables estimation of the proportions of an unknown mixture, even when component-specific markers are not previously known, whenever pure components are measured alongside the mixture.

2019 ◽  
Author(s):  
Celine Everaert ◽  
Hetty Helsmoortel ◽  
Anneleen Decock ◽  
Eva Hulstaert ◽  
Ruben Van Paemel ◽  
...  

AbstractRNA profiling has emerged as a powerful tool to investigate the biomarker potential of human biofluids. However, despite enormous interest in extracellular nucleic acids, RNA sequencing methods to quantify the total RNA content outside cells are rare. Here, we evaluate the performance of the SMARTer Stranded Total RNA-Seq method in human platelet-rich plasma, platelet-free plasma, urine, conditioned medium, and extracellular vesicles (EVs) from these biofluids. We found the method to be accurate, precise, compatible with low-input volumes and able to quantify a few thousand genes. We picked up distinct classes of RNA molecules, including mRNA, lncRNA, circRNA, miscRNA and pseudogenes. Notably, the read distribution and gene content drastically differ among biofluids. In conclusion, we are the first to show that the SMARTer method can be used for unbiased unraveling of the complete transcriptome of a wide range of biofluids and their extracellular vesicles.


2015 ◽  
Vol 9s1 ◽  
pp. BBI.S28992
Author(s):  
Xin Li ◽  
Shaolei Teng

Schizophrenia (SCZ) is a serious psychiatric disorder that affects 1% of general population and places a heavy burden worldwide. The underlying genetic mechanism of SCZ remains unknown, but studies indicate that the disease is associated with a global gene expression disturbance across many genes. Next-generation sequencing, particularly of RNA sequencing (RNA-Seq), provides a powerful genome-scale technology to investigate the pathological processes of SCZ. RNA-Seq has been used to analyze the gene expressions and identify the novel splice isoforms and rare transcripts associated with SCZ. This paper provides an overview on the genetics of SCZ, the advantages of RNA-Seq for transcriptome analysis, the accomplishments of RNA-Seq in SCZ cohorts, and the applications of induced pluripotent stem cells and RNA-Seq in SCZ research.


2021 ◽  
Vol 4 ◽  
Author(s):  
Christopher Hempel ◽  
Julia Harvie ◽  
Jose Hleap Lozano ◽  
Natalie Wright ◽  
Sarah Adamowicz ◽  
...  

Ecological assessments are necessary to evaluate the status of our deteriorating ecosystems, however, assessment methods traditionally omit most microbes because unicellular organisms are challenging to identify. This omission is not ideal, as microbes might be better indicators for changes in environmental conditions than taxa traditionally used. DNA- and RNA-based techniques are increasingly applied for ecological assessments to overcome this challenge but require more testing and optimization. In this study, we compare metagenomics and total RNA sequencing (total RNA-Seq) for their taxonomic profiling performance for microbial communities. We applied both techniques on two sample sets, 1) a commercially available microbial mock community consisting of eight bacterial and two eukaryotic species, and 2) a display tank water sample. We processed the data using 1,532 bioinformatics pipelines and evaluated each workflow, i.e., the combination of sample type (metagenomics or total RNA-Seq) and pipeline, in terms of their accuracy and precision. This talk will showcase preliminary results and highlight differences in workflow performances. A recommended workflow to maximize taxonomic profiling accuracy of microbial communities will also be presented.


2018 ◽  
Author(s):  
Verboom Karen ◽  
Everaert Celine ◽  
Bolduc Nathalie ◽  
Livak J. Kenneth ◽  
Yigit Nurten ◽  
...  

AbstractSingle cell RNA sequencing methods have been increasingly used to understand cellular heterogeneity. Nevertheless, most of these methods suffer from one or more limitations, such as focusing only on polyadenylated RNA, sequencing of only the 3’ end of the transcript, an exuberant fraction of reads mapping to ribosomal RNA, and the unstranded nature of the sequencing data. Here, we developed a novel single cell strand-specific total RNA library preparation method addressing all the aforementioned shortcomings. Our method was validated on a microfluidics system using three different cancer cell lines undergoing a chemical or genetic perturbation. We demonstrate that our total RNA-seq method detects an equal or higher number of genes compared to classic polyA[+] RNA-seq, including novel and non-polyadenylated genes. The obtained RNA expression patterns also recapitulate the expected biological signal. Inherent to total RNA-seq, our method is also able to detect circular RNAs. Taken together, SMARTer single cell total RNA sequencing is very well suited for any single cell sequencing experiment in which transcript level information is needed beyond polyadenylated genes.


2019 ◽  
Vol 9 (1) ◽  
Author(s):  
Celine Everaert ◽  
Hetty Helsmoortel ◽  
Anneleen Decock ◽  
Eva Hulstaert ◽  
Ruben Van Paemel ◽  
...  

AbstractRNA profiling has emerged as a powerful tool to investigate the biomarker potential of human biofluids. However, despite enormous interest in extracellular nucleic acids, RNA sequencing methods to quantify the total RNA content outside cells are rare. Here, we evaluate the performance of the SMARTer Stranded Total RNA-Seq method in human platelet-rich plasma, platelet-free plasma, urine, conditioned medium, and extracellular vesicles (EVs) from these biofluids. We found the method to be accurate, precise, compatible with low-input volumes and able to quantify a few thousand genes. We picked up distinct classes of RNA molecules, including mRNA, lncRNA, circRNA, miscRNA and pseudogenes. Notably, the read distribution and gene content drastically differ among biofluids. In conclusion, we are the first to show that the SMARTer method can be used for unbiased unraveling of the complete transcriptome of a wide range of biofluids and their extracellular vesicles.


Author(s):  
Peter H. Culviner ◽  
Chantal K. Guegler ◽  
Michael T. Laub

AbstractThe profiling of gene expression by RNA-sequencing (RNA-seq) has enabled powerful studies of global transcriptional patterns in all organisms, including bacteria. Because the vast majority of RNA in bacteria is ribosomal RNA (rRNA), it is standard practice to deplete the rRNA from a total RNA sample such that the reads in an RNA-seq experiment derive predominantly from mRNA. One of the most commonly used commercial kits for rRNA depletion, the Ribo-Zero kit from Illumina, was recently discontinued. Here, we report the development a simple, cost-effective, and robust method for depleting rRNA that can be easily implemented by any lab or facility. We first developed an algorithm for designing biotinylated oligonucleotides that will hybridize tightly and specifically to the 23S, 16S, and 5S rRNAs from any species of interest. Precipitation of these oligonucleotides bound to rRNA by magnetic streptavidin beads then depletes rRNA from a complex, total RNA sample such that ~75-80% of reads in a typical RNA-seq experiment derive from mRNA. Importantly, we demonstrate a high correlation of RNA abundance or fold-change measurements in RNA-seq experiments between our method and the previously available Ribo-Zero kit. Complete details on the methodology are provided, including open-source software for designing oligonucleotides optimized for any bacterial species or metagenomic sample of interest.ImportanceThe ability to examine global patterns of gene expression in microbes through RNA-sequencing has fundamentally transformed microbiology. However, RNA-seq depends critically on the removal of ribosomal RNA from total RNA samples. Otherwise, rRNA would comprise upwards of 90% of the reads in a typical RNA-seq experiment, limiting the reads coming from messenger RNA or requiring high total read depth. A commonly used, kit for rRNA subtraction from Illumina was recently discontinued. Here, we report the development of a ‘do-it-yourself’ kit for rapid, cost-effective, and robust depletion of rRNA from total RNA. We present an algorithm for designing biotinylated oligonucleotides that will hybridize to the rRNAs from a target set of species. We then demonstrate that the designed oligos enable sufficient rRNA depletion to produce RNA-seq data with 75-80% of reads comming from mRNA. The methodology presented should enable RNA-seq studies on any species or metagenomic sample of interest.


GigaScience ◽  
2021 ◽  
Vol 10 (12) ◽  
Author(s):  
Youri Hoogstrate ◽  
Malgorzata A Komor ◽  
René Böttcher ◽  
Job van Riet ◽  
Harmen J G van de Werken ◽  
...  

Abstract Background Fusion genes are typically identified by RNA sequencing (RNA-seq) without elucidating the causal genomic breakpoints. However, non–poly(A)-enriched RNA-seq contains large proportions of intronic reads that also span genomic breakpoints. Results We have developed an algorithm, Dr. Disco, that searches for fusion transcripts by taking an entire reference genome into account as search space. This includes exons but also introns, intergenic regions, and sequences that do not meet splice junction motifs. Using 1,275 RNA-seq samples, we investigated to what extent genomic breakpoints can be extracted from RNA-seq data and their implications regarding poly(A)-enriched and ribosomal RNA–minus RNA-seq data. Comparison with whole-genome sequencing data revealed that most genomic breakpoints are not, or minimally, transcribed while, in contrast, the genomic breakpoints of all 32 TMPRSS2-ERG–positive tumours were present at RNA level. We also revealed tumours in which the ERG breakpoint was located before ERG, which co-existed with additional deletions and messenger RNA that incorporated intergenic cryptic exons. In breast cancer we identified rearrangement hot spots near CCND1 and in glioma near CDK4 and MDM2 and could directly associate this with increased expression. Furthermore, in all datasets we find fusions to intergenic regions, often spanning multiple cryptic exons that potentially encode neo-antigens. Thus, fusion transcripts other than classical gene-to-gene fusions are prominently present and can be identified using RNA-seq. Conclusion By using the full potential of non–poly(A)-enriched RNA-seq data, sophisticated analysis can reliably identify expressed genomic breakpoints and their transcriptional effects.


2018 ◽  
Author(s):  
Jack M. Fu ◽  
Kai Kammers ◽  
Abhinav Nellore ◽  
Leonardo Collado-Torres ◽  
Jeffrey T. Leek ◽  
...  

AbstractMore than 70,000 short-read RNA-sequencing samples are publicly available through the recount2 project, a curated database of summary coverage data. However, no current methods can be directly applied to the reduced-representation information stored in this database to estimate transcript-level abundances. Here we present a linear model taking as input summary coverage of junctions and subdivided exons to output estimated abundances and associated uncertainty. We evaluate the performance of our model on simulated and real data, and provide a procedure to construct confidence intervals for estimates.


2019 ◽  
Vol 47 (16) ◽  
pp. e93-e93 ◽  
Author(s):  
Karen Verboom ◽  
Celine Everaert ◽  
Nathalie Bolduc ◽  
Kenneth J Livak ◽  
Nurten Yigit ◽  
...  

Abstract Single cell RNA sequencing methods have been increasingly used to understand cellular heterogeneity. Nevertheless, most of these methods suffer from one or more limitations, such as focusing only on polyadenylated RNA, sequencing of only the 3′ end of the transcript, an exuberant fraction of reads mapping to ribosomal RNA, and the unstranded nature of the sequencing data. Here, we developed a novel single cell strand-specific total RNA library preparation method addressing all the aforementioned shortcomings. Our method was validated on a microfluidics system using three different cancer cell lines undergoing a chemical or genetic perturbation and on two other cancer cell lines sorted in microplates. We demonstrate that our total RNA-seq method detects an equal or higher number of genes compared to classic polyA[+] RNA-seq, including novel and non-polyadenylated genes. The obtained RNA expression patterns also recapitulate the expected biological signal. Inherent to total RNA-seq, our method is also able to detect circular RNAs. Taken together, SMARTer single cell total RNA sequencing is very well suited for any single cell sequencing experiment in which transcript level information is needed beyond polyadenylated genes.


2022 ◽  
Vol 23 (1) ◽  
Author(s):  
Andrea Hita ◽  
Gilles Brocart ◽  
Ana Fernandez ◽  
Marc Rehmsmeier ◽  
Anna Alemany ◽  
...  

Abstract Background Total-RNA sequencing (total-RNA-seq) allows the simultaneous study of both the coding and the non-coding transcriptome. Yet, computational pipelines have traditionally focused on particular biotypes, making assumptions that are not fullfilled by total-RNA-seq datasets. Transcripts from distinct RNA biotypes vary in length, biogenesis, and function, can overlap in a genomic region, and may be present in the genome with a high copy number. Consequently, reads from total-RNA-seq libraries may cause ambiguous genomic alignments, demanding for flexible quantification approaches. Results Here we present Multi-Graph count (MGcount), a total-RNA-seq quantification tool combining two strategies for handling ambiguous alignments. First, MGcount assigns reads hierarchically to small-RNA and long-RNA features to account for length disparity when transcripts overlap in the same genomic position. Next, MGcount aggregates RNA products with similar sequences where reads systematically multi-map using a graph-based approach. MGcount outputs a transcriptomic count matrix compatible with RNA-sequencing downstream analysis pipelines, with both bulk and single-cell resolution, and the graphs that model repeated transcript structures for different biotypes. The software can be used as a python module or as a single-file executable program. Conclusions MGcount is a flexible total-RNA-seq quantification tool that successfully integrates reads that align to multiple genomic locations or that overlap with multiple gene features. Its approach is suitable for the simultaneous estimation of protein-coding, long non-coding and small non-coding transcript concentration, in both precursor and processed forms. Both source code and compiled software are available at https://github.com/hitaandrea/MGcount.


Sign in / Sign up

Export Citation Format

Share Document