Current RNA-seq methodology reporting limits reproducibility

Briefings in Bioinformatics ◽

10.1093/bib/bbz124 ◽

2019 ◽

Cited By ~ 5

Author(s):

Joël Simoneau ◽

Simon Dumontier ◽

Ryan Gosselin ◽

Michelle S Scott

Keyword(s):

Ribonucleic Acid ◽

Biological Sample ◽

Rna Seq ◽

Sequencing Data ◽

Current Standard ◽

Bioinformatics Pipeline ◽

Technical Noise ◽

Rna Molecules ◽

Meaningful Role ◽

Current Standard Practice

Abstract Ribonucleic acid sequencing (RNA-seq) identifies and quantifies RNA molecules from a biological sample. Transformation from raw sequencing data to meaningful gene or isoform counts requires an in silico bioinformatics pipeline. Such pipelines are modular in nature, built using selected software and biological references. Software is usually chosen and parameterized according to the sequencing protocol and biological question. However, while biological and technical noise is alleviated through replicates, biases due to the pipeline and choice of biological references are often overlooked. Here, we show that the current standard practice prevents reproducibility in RNA-seq studies by failing to specify required methodological information. Peer-reviewed articles are intended to apply currently accepted scientific and methodological standards. Inasmuch as the bias-less and optimal RNA-seq pipeline is not perfectly defined, methodological information holds a meaningful role in defining the results. This work illustrates the need for a standardized and explicit display of methodological information in RNA-seq experiments.

Download Full-text

Tools and best practices for allelic expression analysis

10.1101/016097 ◽

2015 ◽

Cited By ~ 2

Author(s):

Stephane E Castel ◽

Ami Levy-Moonshine ◽

Pejman Mohammadi ◽

Eric Banks ◽

Tuuli Lappalainen

Keyword(s):

Best Practices ◽

Read Depth ◽

Allelic Expression ◽

Rna Seq ◽

Sequencing Data ◽

Technical Noise ◽

Nonsense Mediated Decay ◽

Biological Phenomena ◽

Mapping Bias ◽

Technical Bias

Allelic expression (AE) analysis has become an important tool for integrating genome and transcriptome data to characterize various biological phenomena such as cis-regulatory variation and nonsense-mediated decay. In this paper, we systematically analyze the properties of AE read count data and technical sources of error, such as low-quality or double-counted RNA-seq reads, genotyping errors, allelic mapping bias, and technical covariates due to sample preparation and sequencing, and variation in total read depth. We provide guidelines for correcting and filtering for such errors, and show that the resulting AE data has extremely low technical noise. Finally, we introduce novel software for high-throughput production of AE data from RNA-sequencing data, implemented in the GATK framework. These improved tools and best practices for AE analysis yield higher quality AE data by reducing technical bias. This provides a practical framework for wider adoption of AE analysis by the genomics community.

Download Full-text

Assessment of single cell RNA-seq normalization methods

10.1101/064329 ◽

2016 ◽

Author(s):

Bo Ding ◽

Lina Zheng ◽

Wei Wang

Keyword(s):

Single Cell ◽

Rna Seq ◽

Technical Noise ◽

Rna Molecules ◽

Normalization Methods ◽

Using Data

AbstractWe have assessed the performance of seven normalization methods for single cell RNA-seq using data generated from dilution of RNA samples. Our analyses showed that methods considering spike-in ERCC RNA molecules significantly outperformed those not considering ERCCs. This work provides a guidance of selecting normalization methods to remove technical noise in single cell RNA-seq data.

Download Full-text

Identifying proximal RNA interactions from cDNA-encoded crosslinks with ShapeJumper

10.1101/2021.06.10.447916 ◽

2021 ◽

Author(s):

Thomas W Christy ◽

Catherine A Giannetti ◽

Alain Laederach ◽

Kevin M Weeks

Keyword(s):

Reverse Transcriptase ◽

Massively Parallel Sequencing ◽

Three Dimensional ◽

Hydroxyl Groups ◽

Sequencing Data ◽

Bioinformatics Pipeline ◽

Rna Molecules ◽

Adapter Ligation ◽

Alignment Strategy ◽

Nucleotide Resolution

SHAPE-JuMP is a concise strategy for identifying close-in-space interactions in RNA molecules. Nucleotides in close three-dimensional proximity are crosslinked with a bi-reactive reagent that covalently links the 2'-hydroxyl groups of the ribose moieties. The identities of crosslinked nucleotides are determined using an engineered reverse transcriptase that jumps across crosslinked sites, resulting in a deletion in the cDNA that is detected using massively parallel sequencing. Here we introduce ShapeJumper, a bioinformatics pipeline to process SHAPE-JuMP sequencing data and to accurately identify through-space interactions. ShapeJumper identifies proximal interactions with near-nucleotide resolution using an alignment strategy that is optimized to tolerate the unique non-templated reverse-transcription profile of the engineered crosslink-traversing reverse-transcriptase. JuMP-inspired strategies are now poised to replace adapter-ligation for detecting RNA-RNA interactions in most crosslinking experiments.

Download Full-text

Identifying proximal RNA interactions from cDNA-encoded crosslinks with ShapeJumper

PLoS Computational Biology ◽

10.1371/journal.pcbi.1009632 ◽

2021 ◽

Vol 17 (12) ◽

pp. e1009632

Author(s):

Thomas W. Christy ◽

Catherine A. Giannetti ◽

Alain Laederach ◽

Kevin M. Weeks

Keyword(s):

Reverse Transcriptase ◽

Massively Parallel Sequencing ◽

Three Dimensional ◽

Hydroxyl Groups ◽

Sequencing Data ◽

Bioinformatics Pipeline ◽

Rna Molecules ◽

Adapter Ligation ◽

Alignment Strategy ◽

Nucleotide Resolution

SHAPE-JuMP is a concise strategy for identifying close-in-space interactions in RNA molecules. Nucleotides in close three-dimensional proximity are crosslinked with a bi-reactive reagent that covalently links the 2’-hydroxyl groups of the ribose moieties. The identities of crosslinked nucleotides are determined using an engineered reverse transcriptase that jumps across crosslinked sites, resulting in a deletion in the cDNA that is detected using massively parallel sequencing. Here we introduce ShapeJumper, a bioinformatics pipeline to process SHAPE-JuMP sequencing data and to accurately identify through-space interactions, as observed in complex JuMP datasets. ShapeJumper identifies proximal interactions with near-nucleotide resolution using an alignment strategy that is optimized to tolerate the unique non-templated reverse-transcription profile of the engineered crosslink-traversing reverse-transcriptase. JuMP-inspired strategies are now poised to replace adapter-ligation for detecting RNA-RNA interactions in most crosslinking experiments.

Download Full-text

Assessment of Single Cell RNA-Seq Normalization Methods

G3 Genes|Genome|Genetics ◽

10.1534/g3.117.040683 ◽

2017 ◽

Vol 7 (7) ◽

pp. 2039-2045 ◽

Cited By ~ 6

Author(s):

Bo Ding ◽

Lina Zheng ◽

Wei Wang

Keyword(s):

Single Cell ◽

Rna Seq ◽

Technical Noise ◽

Rna Molecules ◽

Normalization Methods ◽

Using Data

Abstract We have assessed the performance of seven normalization methods for single cell RNA-seq using data generated from dilution of RNA samples. Our analyses showed that methods considering spike-in External RNA Control Consortium (ERCC) RNA molecules significantly outperformed those not considering ERCCs. This work provides a guidance of selecting normalization methods to remove technical noise in single cell RNA-seq data.

Download Full-text

Advancing clinical genomics and precision medicine with GVViZ: FAIR bioinformatics platform for variable gene-disease annotation, visualization, and expression analysis

Human Genomics ◽

10.1186/s40246-021-00336-1 ◽

2021 ◽

Vol 15 (1) ◽

Author(s):

Zeeshan Ahmed ◽

Eduard Gibert Renart ◽

Saman Zeeshan ◽

XinQi Dong

Keyword(s):

Data Analysis ◽

Patient Care ◽

Expression Analysis ◽

High Throughput ◽

Gene Annotation ◽

Next Generation Sequencing Data ◽

Rna Seq ◽

Sequencing Data ◽

Complex Disorders ◽

Transcriptomics Data

Abstract Background Genetic disposition is considered critical for identifying subjects at high risk for disease development. Investigating disease-causing and high and low expressed genes can support finding the root causes of uncertainties in patient care. However, independent and timely high-throughput next-generation sequencing data analysis is still a challenge for non-computational biologists and geneticists. Results In this manuscript, we present a findable, accessible, interactive, and reusable (FAIR) bioinformatics platform, i.e., GVViZ (visualizing genes with disease-causing variants). GVViZ is a user-friendly, cross-platform, and database application for RNA-seq-driven variable and complex gene-disease data annotation and expression analysis with a dynamic heat map visualization. GVViZ has the potential to find patterns across millions of features and extract actionable information, which can support the early detection of complex disorders and the development of new therapies for personalized patient care. The execution of GVViZ is based on a set of simple instructions that users without a computational background can follow to design and perform customized data analysis. It can assimilate patients’ transcriptomics data with the public, proprietary, and our in-house developed gene-disease databases to query, easily explore, and access information on gene annotation and classified disease phenotypes with greater visibility and customization. To test its performance and understand the clinical and scientific impact of GVViZ, we present GVViZ analysis for different chronic diseases and conditions, including Alzheimer’s disease, arthritis, asthma, diabetes mellitus, heart failure, hypertension, obesity, osteoporosis, and multiple cancer disorders. The results are visualized using GVViZ and can be exported as image (PNF/TIFF) and text (CSV) files that include gene names, Ensembl (ENSG) IDs, quantified abundances, expressed transcript lengths, and annotated oncology and non-oncology diseases. Conclusions We emphasize that automated and interactive visualization should be an indispensable component of modern RNA-seq analysis, which is currently not the case. However, experts in clinics and researchers in life sciences can use GVViZ to visualize and interpret the transcriptomics data, making it a powerful tool to study the dynamics of gene expression and regulation. Furthermore, with successful deployment in clinical settings, GVViZ has the potential to enable high-throughput correlations between patient diagnoses based on clinical and transcriptomics data.

Download Full-text

Increased yields of duplex sequencing data by a series of quality control tools

NAR Genomics and Bioinformatics ◽

10.1093/nargab/lqab002 ◽

2021 ◽

Vol 3 (1) ◽

Author(s):

Gundula Povysil ◽

Monika Heinzl ◽

Renato Salazar ◽

Nicholas Stoler ◽

Anton Nekrutenko ◽

...

Keyword(s):

Low Frequency ◽

Variant Calling ◽

Data Loss ◽

Sequencing Data ◽

Bioinformatics Pipeline ◽

Consensus Sequences ◽

Sequencing Errors ◽

Data Output ◽

Reverse Strand ◽

Duplex Sequencing

Abstract Duplex sequencing is currently the most reliable method to identify ultra-low frequency DNA variants by grouping sequence reads derived from the same DNA molecule into families with information on the forward and reverse strand. However, only a small proportion of reads are assembled into duplex consensus sequences (DCS), and reads with potentially valuable information are discarded at different steps of the bioinformatics pipeline, especially reads without a family. We developed a bioinformatics toolset that analyses the tag and family composition with the purpose to understand data loss and implement modifications to maximize the data output for the variant calling. Specifically, our tools show that tags contain polymerase chain reaction and sequencing errors that contribute to data loss and lower DCS yields. Our tools also identified chimeras, which likely reflect barcode collisions. Finally, we also developed a tool that re-examines variant calls from raw reads and provides different summary data that categorizes the confidence level of a variant call by a tier-based system. With this tool, we can include reads without a family and check the reliability of the call, that increases substantially the sequencing depth for variant calling, a particular important advantage for low-input samples or low-coverage regions.

Download Full-text

Role of conformational heterogeneity in ligand recognition by viral RNA molecules

Physical Chemistry Chemical Physics ◽

10.1039/d1cp00679g ◽

2021 ◽

Author(s):

Lev Levintov ◽

Harish Vashisth

Keyword(s):

Conformational Changes ◽

Ribonucleic Acid ◽

Viral Rna ◽

Ligand Recognition ◽

Conformational Heterogeneity ◽

Rna Molecules ◽

Environmental Stimuli

Ribonucleic acid (RNA) molecules are known to undergo conformational changes in response to various environmental stimuli including temperature, pH, and ligands. In particular, viral RNA molecules are a key example...

Download Full-text

Replicate sequencing libraries are important for quantification of allelic imbalance

Nature Communications ◽

10.1038/s41467-021-23544-8 ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Asia Mendelevich ◽

Svetlana Vinogradova ◽

Saumya Gupta ◽

Andrey A. Mironov ◽

Shamil R. Sunyaev ◽

...

Keyword(s):

Allelic Imbalance ◽

False Positive Rate ◽

Error Rates ◽

Differential Analysis ◽

Rna Seq ◽

Specific Expression ◽

Technical Noise ◽

Specific Analysis ◽

Positive Rate ◽

Allele Specific

AbstractA sensitive approach to quantitative analysis of transcriptional regulation in diploid organisms is analysis of allelic imbalance (AI) in RNA sequencing (RNA-seq) data. A near-universal practice in such studies is to prepare and sequence only one library per RNA sample. We present theoretical and experimental evidence that data from a single RNA-seq library is insufficient for reliable quantification of the contribution of technical noise to the observed AI signal; consequently, reliance on one-replicate experimental design can lead to unaccounted-for variation in error rates in allele-specific analysis. We develop a computational approach, Qllelic, that accurately accounts for technical noise by making use of replicate RNA-seq libraries. Testing on new and existing datasets shows that application of Qllelic greatly decreases false positive rate in allele-specific analysis while conserving appropriate signal, and thus greatly improves reproducibility of AI estimates. We explore sources of technical overdispersion in observed AI signal and conclude by discussing design of RNA-seq studies addressing two biologically important questions: quantification of transcriptome-wide AI in one sample, and differential analysis of allele-specific expression between samples.

Download Full-text

Small RNA-Sequencing: Approaches and Considerations for miRNA Analysis

Diagnostics ◽

10.3390/diagnostics11060964 ◽

2021 ◽

Vol 11 (6) ◽

pp. 964

Author(s):

Sarka Benesova ◽

Mikael Kubista ◽

Lukas Valihrach

Keyword(s):

Rna Sequencing ◽

Small Rna ◽

High Sensitivity ◽

Small Rna Sequencing ◽

Rna Seq ◽

Liquid Biopsies ◽

Comprehensive Overview ◽

Rna Molecules ◽

Novel Mirna ◽

The Many

MicroRNAs (miRNAs) are a class of small RNA molecules that have an important regulatory role in multiple physiological and pathological processes. Their disease-specific profiles and presence in biofluids are properties that enable miRNAs to be employed as non-invasive biomarkers. In the past decades, several methods have been developed for miRNA analysis, including small RNA sequencing (RNA-seq). Small RNA-seq enables genome-wide profiling and analysis of known, as well as novel, miRNA variants. Moreover, its high sensitivity allows for profiling of low input samples such as liquid biopsies, which have now found applications in diagnostics and prognostics. Still, due to technical bias and the limited ability to capture the true miRNA representation, its potential remains unfulfilled. The introduction of many new small RNA-seq approaches that tried to minimize this bias, has led to the existence of the many small RNA-seq protocols seen today. Here, we review all current approaches to cDNA library construction used during the small RNA-seq workflow, with particular focus on their implementation in commercially available protocols. We provide an overview of each protocol and discuss their applicability. We also review recent benchmarking studies comparing each protocol’s performance and summarize the major conclusions that can be gathered from their usage. The result documents variable performance of the protocols and highlights their different applications in miRNA research. Taken together, our review provides a comprehensive overview of all the current small RNA-seq approaches, summarizes their strengths and weaknesses, and provides guidelines for their applications in miRNA research.

Download Full-text