scholarly journals How well do RNA-Seq differential gene expression tools perform in a eukaryote with a complex transcriptome?

2016 ◽  
Author(s):  
Kimon Froussios ◽  
Nick J. Schurch ◽  
Katarzyna Mackinnon ◽  
Marek Gierliński ◽  
Céline Duc ◽  
...  

AbstractRNA-seq experiments are usually carried out in three or fewer replicates. In order to work well with so few samples, Differential Gene Expression (DGE) tools typically assume the form of the underlying distribution of gene expression. A recent highly replicated study revealed that RNA-seq gene expression measurements in yeast are best represented as being drawn from an underlying negative binomial distribution. In this paper, the statistical properties of gene expression in the higher eukaryote Arabidopsis thaliana are shown to be essentially identical to those from yeast despite the large increase in the size and complexity of the transcriptome: Gene expression measurements from this model plant species are consistent with being drawn from an underlying negative binomial or log-normal distribution and the false positive rate performance of nine widely used DGE tools is not strongly affected by the additional size and complexity of the A. thaliana transcriptome. For RNA-seq data, we therefore recommend the use of DGE tools that are based on the negative binomial distribution.

2019 ◽  
Vol 35 (18) ◽  
pp. 3372-3377 ◽  
Author(s):  
Kimon Froussios ◽  
Nick J Schurch ◽  
Katarzyna Mackinnon ◽  
Marek Gierliński ◽  
Céline Duc ◽  
...  

Abstract Motivation RNA-seq experiments are usually carried out in three or fewer replicates. In order to work well with so few samples, differential gene expression (DGE) tools typically assume the form of the underlying gene expression distribution. In this paper, the statistical properties of gene expression from RNA-seq are investigated in the complex eukaryote, Arabidopsis thaliana, extending and generalizing the results of previous work in the simple eukaryote Saccharomyces cerevisiae. Results We show that, consistent with the results in S.cerevisiae, more gene expression measurements in A.thaliana are consistent with being drawn from an underlying negative binomial distribution than either a log-normal distribution or a normal distribution, and that the size and complexity of the A.thaliana transcriptome does not influence the false positive rate performance of nine widely used DGE tools tested here. We therefore recommend the use of DGE tools that are based on the negative binomial distribution. Availability and implementation The raw data for the 17 WT Arabidopsis thaliana datasets is available from the European Nucleotide Archive (E-MTAB-5446). The processed and aligned data can be visualized in context using IGB (Freese et al., 2016), or downloaded directly, using our publicly available IGB quickload server at https://compbio.lifesci.dundee.ac.uk/arabidopsisQuickload/public_quickload/ under ‘RNAseq>Froussios2019’. All scripts and commands are available from github at https://github.com/bartongroup/KF_arabidopsis-GRNA. Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
Yanming Di ◽  
Daniel W Schafer ◽  
Jason S Cumbie ◽  
Jeff H Chang

We propose a new statistical test for assessing differential gene expression using RNA sequencing (RNA-Seq) data. Commonly used probability distributions, such as binomial or Poisson, cannot appropriately model the count variability in RNA-Seq data due to overdispersion. The small sample size that is typical in this type of data also prevents the uncritical use of tools derived from large-sample asymptotic theory. The test we propose is based on the NBP parameterization of the negative binomial distribution. It extends an exact test proposed by Robinson and Smyth (2007, 2008). In one version of Robinson and Smyth’s test, a constant dispersion parameter is used to model the count variability between biological replicates. We introduce an additional parameter to allow the dispersion parameter to depend on the mean. Our parametric method complements nonparametric regression approaches for modeling the dispersion parameter. We apply the test we propose to an Arabidopsis data set and a range of simulated data sets. The results show that the test is simple, powerful and reasonably robust against departures from model assumptions.


2019 ◽  
Vol 12 (1) ◽  
pp. 11-19 ◽  
Author(s):  
Jun-Young Shin ◽  
Sang-Heon Choi ◽  
Da-Woon Choi ◽  
Ye-Jin An ◽  
Jae-Hyuk Seo ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document