Differential expression analysis of RNA sequencing data by incorporating non-exonic mapped reads

Mapping Intimacies ◽

10.1101/016196 ◽

2015 ◽

Author(s):

Hung-I Harry Chen ◽

Yuanhang Liu ◽

Yi Zou ◽

Zhao Lai ◽

Devanand Sarkar ◽

...

Keyword(s):

Differential Expression ◽

Rna Sequencing ◽

Expression Analysis ◽

Background Noise ◽

Negative Binomial ◽

Differential Expression Analysis ◽

Rna Seq ◽

Sequencing Data ◽

Expression Levels ◽

Differential Expressed Genes

Background RNA sequencing (RNA-seq) is a powerful tool for genome-wide expression profiling of biological samples with the advantage of high-throughput and high resolution. There are many existing algorithms nowadays for quantifying expression levels and detecting differential gene expression, but none of them takes the misaligned reads that are mapped to non-exonic regions into account. We developed a novel algorithm, XBSeq, where a statistical model was established based on the assumption that observed signals are the convolution of true expression signals and sequencing noises. The mapped reads in non-exonic regions are considered as sequencing noises, which follows a Poisson distribution. Given measureable observed and noise signals from RNA-seq data, true expression signals, assuming governed by the negative binomial distribution, can be delineated and thus the accurate detection of differential expressed genes. Results We implemented our novel XBSeq algorithm and evaluated it by using a set of simulated expression datasets under different conditions, using a combination of negative binomial and Poisson distributions with parameters derived from real RNA-seq data. We compared the performance of our method with other commonly used differential expression analysis algorithms. We also evaluated the changes in true and false positive rates with variations in biological replicates, differential fold changes, and expression levels in non-exonic regions. We also tested the algorithm on a set of real RNA-seq data where the common and different detection results from different algorithms were reported. Conclusions In this paper, we proposed a novel XBSeq, a differential expression analysis algorithm for RNA-seq data that takes non-exonic mapped reads into consideration. When background noise is at baseline level, the performance of XBSeq and DESeq are mostly equivalent. However, our method surpasses DESeq and other algorithms with the increase of non-exonic mapped reads. Only in very low read count condition XBSeq had a slightly higher false discovery rate, which may be improved by adjusting the background noise effect in this situation. Taken together, by considering non-exonic mapped reads, XBSeq can provide accurate expression measurement and thus detect differential expressed genes even in noisy conditions.

Download Full-text

Accounting for technical noise in differential expression analysis of single-cell RNA sequencing data

Nucleic Acids Research ◽

10.1093/nar/gkx754 ◽

2017 ◽

Vol 45 (19) ◽

pp. 10978-10988 ◽

Cited By ~ 26

Author(s):

Cheng Jia ◽

Yu Hu ◽

Derek Kelly ◽

Junhyong Kim ◽

Mingyao Li ◽

...

Keyword(s):

Single Cell ◽

Differential Expression ◽

Rna Sequencing ◽

Expression Analysis ◽

Differential Expression Analysis ◽

Sequencing Data ◽

Technical Noise ◽

Single Cell Rna Sequencing

Download Full-text

The long and the short of it: unlocking nanopore long-read RNA sequencing data with short-read differential expression analysis tools

NAR Genomics and Bioinformatics ◽

10.1093/nargab/lqab028 ◽

2021 ◽

Vol 3 (2) ◽

Author(s):

Xueyi Dong ◽

Luyi Tian ◽

Quentin Gouil ◽

Hasaru Kariyawasam ◽

Shian Su ◽

...

Keyword(s):

Differential Expression ◽

Expression Analysis ◽

Differential Expression Analysis ◽

Transcriptomic Analysis ◽

Statistical Testing ◽

Rna Seq ◽

Sequencing Data ◽

Short Read ◽

Sequencing Platform ◽

Long Read

Abstract Application of Oxford Nanopore Technologies’ long-read sequencing platform to transcriptomic analysis is increasing in popularity. However, such analysis can be challenging due to the high sequence error and small library sizes, which decreases quantification accuracy and reduces power for statistical testing. Here, we report the analysis of two nanopore RNA-seq datasets with the goal of obtaining gene- and isoform-level differential expression information. A dataset of synthetic, spliced, spike-in RNAs (‘sequins’) as well as a mouse neural stem cell dataset from samples with a null mutation of the epigenetic regulator Smchd1 was analysed using a mix of long-read specific tools for preprocessing together with established short-read RNA-seq methods for downstream analysis. We used limma-voom to perform differential gene expression analysis, and the novel FLAMES pipeline to perform isoform identification and quantification, followed by DRIMSeq and limma-diffSplice (with stageR) to perform differential transcript usage analysis. We compared results from the sequins dataset to the ground truth, and results of the mouse dataset to a previous short-read study on equivalent samples. Overall, our work shows that transcriptomic analysis of long-read nanopore data using long-read specific preprocessing methods together with short-read differential expression methods and software that are already in wide use can yield meaningful results.

Download Full-text

qSVA framework for RNA quality correction in differential expression analysis

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1617384114 ◽

2017 ◽

Vol 114 (27) ◽

pp. 7130-7135 ◽

Cited By ~ 41

Author(s):

Andrew E. Jaffe ◽

Ran Tao ◽

Alexis L. Norris ◽

Marc Kealhofer ◽

Abhinav Nellore ◽

...

Keyword(s):

Differential Expression ◽

Expression Analysis ◽

Differential Expression Analysis ◽

Brain Regions ◽

Rna Degradation ◽

Rna Seq ◽

Postmortem Human Brain ◽

Surrogate Variable Analysis ◽

Expression Levels ◽

Rna Quality

RNA sequencing (RNA-seq) is a powerful approach for measuring gene expression levels in cells and tissues, but it relies on high-quality RNA. We demonstrate here that statistical adjustment using existing quality measures largely fails to remove the effects of RNA degradation when RNA quality associates with the outcome of interest. Using RNA-seq data from molecular degradation experiments of human primary tissues, we introduce a method—quality surrogate variable analysis (qSVA)—as a framework for estimating and removing the confounding effect of RNA quality in differential expression analysis. We show that this approach results in greatly improved replication rates (>3×) across two large independent postmortem human brain studies of schizophrenia and also removes potential RNA quality biases in earlier published work that compared expression levels of different brain regions and other diagnostic groups. Our approach can therefore improve the interpretation of differential expression analysis of transcriptomic data from human tissue.

Download Full-text

Comprehensive processing of high-throughput small RNA sequencing data including quality checking, normalization, and differential expression analysis using the UEA sRNA Workbench

RNA ◽

10.1261/rna.059360.116 ◽

2017 ◽

Vol 23 (6) ◽

pp. 823-835 ◽

Cited By ~ 21

Author(s):

Matthew Beckers ◽

Irina Mohorianu ◽

Matthew Stocks ◽

Christopher Applegate ◽

Tamas Dalmay ◽

...

Keyword(s):

Differential Expression ◽

Rna Sequencing ◽

Expression Analysis ◽

High Throughput ◽

Small Rna ◽

Differential Expression Analysis ◽

Small Rna Sequencing ◽

Sequencing Data ◽

Comprehensive Processing

Download Full-text

A relative comparison between Hidden Markov- and Log-Linear-based models for differential expression analysis in a real time course RNA sequencing data

10.1101/448886 ◽

2018 ◽

Author(s):

Fatemeh Gholizadeh ◽

Zahra Salehi ◽

Ali Mohammad banaei-Moghaddam ◽

Abbas Rahimi Foroushani ◽

Kaveh kavousi

Keyword(s):

Real Time ◽

Differential Expression ◽

Rna Sequencing ◽

Time Course ◽

Hidden Markov ◽

Differential Expression Analysis ◽

Rna Seq ◽

Sequencing Data ◽

Normalization Methods ◽

Log Linear

AbstractWith the advent of the Next Generation Sequencing technologies, RNA-seq has become known as an optimal approach for studying gene expression profiling. Particularly, time course RNA-seq differential expression analysis has been used in many studies to identify candidate genes. However, applying a statistical method to efficiently identify differentially expressed genes (DEGs) in time course studies is challenging due to inherent characteristics of such data including correlation and dependencies over time. Here we aim to relatively compare EBSeq-HMM, a Hidden Markov-based model, with multiDE, a Log-Linear-based model, in a real time course RNA sequencing data. In order to conduct the comparison, common DEGs detected by edgeR, DESeq2 and Voom (referred to as Benchmark DEGs) were utilized as a measure. Each of the two models were compared using different normalization methods. The findings revealed that multiDE identified more Benchmark DEGs and showed a higher agreement with them than EBSeq-HMM. Furthermore, multiDE and EBSeq-HMM displayed their best performance using TMM and Upper-Quartile normalization methods, respectively.

Download Full-text

Adjusted Sample Size Calculation for RNA-seq Data in the Presence of Confounding Covariates

BioMedInformatics ◽

10.3390/biomedinformatics1020004 ◽

2021 ◽

Vol 1 (2) ◽

pp. 47-63

Author(s):

Xiaohong Li ◽

Shesh N. Rai ◽

Eric C. Rouchka ◽

Timothy E. O’Toole ◽

Nigel G. F. Cooper

Keyword(s):

Sample Size ◽

Differential Expression ◽

Expression Analysis ◽

Negative Binomial ◽

Differential Expression Analysis ◽

Sample Size Calculation ◽

Size Estimation ◽

Rna Seq ◽

Simulation Based ◽

The Relationship

Sample size calculation for adequate power analysis is critical in optimizing RNA-seq experimental design. However, the complexity increases for directly estimating sample size when taking into consideration confounding covariates. Although a number of approaches for sample size calculation have been proposed for RNA-seq data, most ignore any potential heterogeneity. In this study, we implemented a simulation-based and confounder-adjusted method to provide sample size recommendations for RNA-seq differential expression analysis. The data was generated using Monte Carlo simulation, given an underlined distribution of confounding covariates and parameters for a negative binomial distribution. The relationship between the sample size with the power and parameters, such as dispersion, fold change and mean read counts, can be visualized. We demonstrate that the adjusted sample size for a desired power and type one error rate of α is usually larger when taking confounding covariates into account. More importantly, our simulation study reveals that sample size may be underestimated by existing methods if a confounding covariate exists in RNA-seq data. Consequently, this underestimate could affect the detection power for the differential expression analysis. Therefore, we introduce confounding covariates for sample size estimation for heterogeneous RNA-seq data.

Download Full-text

A framework for RNA quality correction in differential expression analysis

10.1101/074245 ◽

2016 ◽

Cited By ~ 2

Author(s):

Andrew E. Jaffe ◽

Ran Tao ◽

Alexis L. Norris ◽

Marc Kealhofer ◽

Abhinav Nellore ◽

...

Keyword(s):

Human Brain ◽

Differential Expression ◽

Expression Analysis ◽

Differential Expression Analysis ◽

Rna Degradation ◽

Rna Seq ◽

Human Brain Tissue ◽

Postmortem Human Brain ◽

Expression Levels ◽

Rna Quality

AbstractRNA sequencing (RNA-seq) is a powerful approach for measuring gene expression levels in cells and tissues, but it relies on high-quality RNA. We demonstrate here that statistical adjustment employing existing quality measures largely fails to remove the effects of RNA degradation when RNA quality associates with the outcome of interest. Using RNA-seq data from a molecular degradation experiment of human brain tissue, we introduce the quality surrogate variable (qSVA) analysis framework for estimating and removing the confounding effect of RNA quality in differential expression analysis. We show this approach results in greatly improved replication rates (>3x) across two large independent postmortem human brain studies of schizophrenia. Finally, we explored public datasets to demonstrate potential RNA quality confounding when comparing expression levels of different brain regions and diagnostic groups beyond schizophrenia. Our approach can therefore improve the interpretation of differential expression analysis of transcriptomic data from the human brain.

Download Full-text

Differential gene expression analysis tools exhibit substandard performance for long non-coding RNA–sequencing data

10.1101/220129 ◽

2017 ◽

Cited By ~ 2

Author(s):

Alemu Takele Assefa ◽

Katrijn De Paepe ◽

Celine Everaert ◽

Pieter Mestdagh ◽

Olivier Thas ◽

...

Keyword(s):

Gene Expression ◽

Differential Expression ◽

Expression Analysis ◽

Web Application ◽

Empirical Bayes ◽

Performance Metrics ◽

Differential Expression Analysis ◽

Rna Seq ◽

Sequencing Data ◽

Normalization Methods

ABSTRACTBackgroundProtein-coding RNAs (mRNA) have been the primary target of most transcriptome studies in the past, but in recent years, attention has expanded to include long non-coding RNAs (lncRNA). lncRNAs are typically expressed at low levels, and are inherently highly variable. This is a fundamental challenge for differential expression (DE) analysis. In this study, the performance of 14 popular tools for testing DE in RNA-seq data along with their normalization methods is comprehensively evaluated, with a particular focus on lncRNAs and low abundant mRNAs.ResultsThirteen performance metrics were used to evaluate DE tools and normalization methods using simulations and analyses of six diverse RNA-seq datasets. Non-parametric procedures are used to simulate gene expression data in such a way that realistic levels of expression and variability are preserved in the simulated data. Throughout the assessment, we kept track of the results for mRNA and lncRNA separately. All statistical models exhibited inferior performance for lncRNAs compared to mRNAs across all simulated scenarios and analysis of benchmark RNA-seq datasets. No single tool uniformly outperformed the others.ConclusionOverall, the linear modeling with empirical Bayes moderation (limma) and the nonparametric approach (SAMSeq) showed best performance: good control of the false discovery rate (FDR) and reasonable sensitivity. However, for achieving a sensitivity of at least 50%, more than 80 samples are required when studying expression levels in a realistic clinical settings such as in cancer research. About half of the methods showed severe excess of false discoveries, making these methods unreliable for differential expression analysis and jeopardizing reproducible science. The detailed results of our study can be consulted through a user-friendly web application, http://statapps.ugent.be/tools/AppDGE/

Download Full-text

zingeR: unlocking RNA-seq tools for zero-inflation and single cell applications

10.1101/157982 ◽

2017 ◽

Cited By ~ 7

Author(s):

Koen Van den Berge ◽

Charlotte Soneson ◽

Michael I. Love ◽

Mark D. Robinson ◽

Lieven Clement

Keyword(s):

Single Cell ◽

Differential Expression ◽

Expression Analysis ◽

Negative Binomial ◽

Differential Expression Analysis ◽

Negative Binomial Model ◽

Binomial Model ◽

Rna Seq ◽

Zero Inflation ◽

Zero Counts

AbstractDropout in single cell RNA-seq (scRNA-seq) applications causes many transcripts to go undetected. It induces excess zero counts, which leads to power issues in differential expression (DE) analysis and has triggered the development of bespoke scRNA-seq DE tools that cope with zero-inflation. Recent evaluations, however, have shown that dedicated scRNA-seq tools provide no advantage compared to traditional bulk RNA-seq tools. We introduce zingeR, a zero-inflated negative binomial model that identifies excess zero counts and generates observation weights to unlock bulk RNA-seq pipelines for zero-inflation, boosting performance in scRNA-seq differential expression analysis.

Download Full-text