scholarly journals Variance component testing for identifying differentially expressed genes in RNA-seq data

PeerJ ◽  
2017 ◽  
Vol 5 ◽  
pp. e3797 ◽  
Author(s):  
Sheng Yang ◽  
Fang Shao ◽  
Weiwei Duan ◽  
Yang Zhao ◽  
Feng Chen

RNA sequencing (RNA-Seq) enables the measurement and comparison of gene expression with isoform-level quantification. Differences in the effect of each isoform may make traditional methods, which aggregate isoforms, ineffective. Here, we introduce a variance component-based test that can jointly test multiple isoforms of one gene to identify differentially expressed (DE) genes, especially those with isoforms that have differential effects. We model isoform-level expression data from RNA-Seq using a negative binomial distribution and consider the baseline abundance of isoforms and their effects as two random terms. Our approach tests the global null hypothesis of no difference in any of the isoforms. The null distribution of the derived score statistic is investigated using empirical and theoretical methods. The results of simulations suggest that the performance of the proposed set test is superior to that of traditional algorithms and almost reaches optimal power when the variance of covariates is large. This method is also applied to analyze real data. Our algorithm, as a supplement to traditional algorithms, is superior at selecting DE genes with sparse or opposite effects for isoforms.

2016 ◽  
Vol 14 (06) ◽  
pp. 1650034 ◽  
Author(s):  
Naim Al Mahi ◽  
Munni Begum

One of the primary objectives of ribonucleic acid (RNA) sequencing or RNA-Seq experiment is to identify differentially expressed (DE) genes in two or more treatment conditions. It is a common practice to assume that all read counts from RNA-Seq data follow overdispersed (OD) Poisson or negative binomial (NB) distribution, which is sometimes misleading because within each condition, some genes may have unvarying transcription levels with no overdispersion. In such a case, it is more appropriate and logical to consider two sets of genes: OD and non-overdispersed (NOD). We propose a new two-step integrated approach to distinguish DE genes in RNA-Seq data using standard Poisson and NB models for NOD and OD genes, respectively. This is an integrated approach because this method can be merged with any other NB-based methods for detecting DE genes. We design a simulation study and analyze two real RNA-Seq data to evaluate the proposed strategy. We compare the performance of this new method combined with the three [Formula: see text]-software packages namely edgeR, DESeq2, and DSS with their default settings. For both the simulated and real data sets, integrated approaches perform better or at least equally well compared to the regular methods embedded in these [Formula: see text]-packages.


PeerJ ◽  
2019 ◽  
Vol 7 ◽  
pp. e8260 ◽  
Author(s):  
Necla Koçhan ◽  
G. Yazgi Tutuncu ◽  
Gordon K. Smyth ◽  
Luke C. Gandolfo ◽  
Göknur Giner

Classification on the basis of gene expression data derived from RNA-seq promises to become an important part of modern medicine. We propose a new classification method based on a model where the data is marginally negative binomial but dependent, thereby incorporating the dependence known to be present between measurements from different genes. The method, called qtQDA, works by first performing a quantile transformation (qt) then applying Gaussian quadratic discriminant analysis (QDA) using regularized covariance matrix estimates. We show that qtQDA has excellent performance when applied to real data sets and has advantages over some existing approaches. An R package implementing the method is also available on https://github.com/goknurginer/qtQDA.


2019 ◽  
Author(s):  
Necla Koçhan ◽  
Gözde Y. Tütüncü ◽  
Gordon K. Smyth ◽  
Luke C. Gandolfo ◽  
Göknur Giner

AbstractClassification on the basis of gene expression data derived from RNA-seq promises to become an important part of modern medicine. We propose a new classification method based on a model where the data is marginally negative binomial but dependent, thereby incorporating the dependence known to be present between measurements from different genes. The method, called qtQDA, works by first performing a quantile transformation (qt) then applying Gaussian Quadratic Discriminant Analysis (QDA) using regularized covariance matrix estimates. We show that qtQDA has excellent performance when applied to real data sets and has advantages over some existing approaches. An R package implementing the method is also available.


2015 ◽  
Author(s):  
David M Rocke ◽  
Luyao Ruan ◽  
Yilun Zhang ◽  
J. Jared Gossett ◽  
Blythe Durbin-Johnson ◽  
...  

Motivation: An important property of a valid method for testing for differential expression is that the false positive rate should at least roughly correspond to the p-value cutoff, so that if 10,000 genes are tested at a p-value cutoff of 10−4, and if all the null hypotheses are true, then there should be only about 1 gene declared to be significantly differentially expressed. We tested this by resampling from existing RNA-Seq data sets and also by matched negative binomial simulations. Results: Methods we examined, which rely strongly on a negative binomial model, such as edgeR, DESeq, and DESeq2, show large numbers of false positives in both the resampled real-data case and in the simulated negative binomial case. This also occurs with a negative binomial generalized linear model function in R. Methods that use only the variance function, such as limma-voom, do not show excessive false positives, as is also the case with a variance stabilizing transformation followed by linear model analysis with limma. The excess false positives are likely caused by apparently small biases in estimation of negative binomial dispersion and, perhaps surprisingly, occur mostly when the mean and/or the dis-persion is high, rather than for low-count genes.


F1000Research ◽  
2017 ◽  
Vol 6 ◽  
pp. 997
Author(s):  
Shruti Kane ◽  
Himanshu Garg ◽  
Neeraja M. Krishnan ◽  
Aditya Singh ◽  
Binay Panda

RNA sequencing (RNA-seq) is a powerful technology that allows one to assess the RNA levels in a sample. Analysis of these levels can help in identifying novel transcripts (coding, non-coding and splice variants), understanding transcript structures, and estimating gene/allele expression. Biologists face specific challenges while designing RNA-seq experiments. The nature of these challenges lies in determining the total number of sequenced reads and technical replicates required for detecting marginally differentially expressed transcripts. Despite previous attempts to address these challenges, easily-accessible and biologist-friendly mobile applications do not exist. Thus, we developed RNAtor, a mobile application for Android platforms, to aid biologists in correctly designing their RNA-seq experiments. The recommendations from RNAtor are based on simulations and real data.


Biostatistics ◽  
2017 ◽  
Vol 18 (4) ◽  
pp. 637-650 ◽  
Author(s):  
Luis León-Novelo ◽  
Claudio Fuentes ◽  
Sarah Emerson

SUMMARY RNA-Seq data characteristically exhibits large variances, which need to be appropriately accounted for in any proposed model. We first explore the effects of this variability on the maximum likelihood estimator (MLE) of the dispersion parameter of the negative binomial distribution, and propose instead to use an estimator obtained via maximization of the marginal likelihood in a conjugate Bayesian framework. We show, via simulation studies, that the marginal MLE can better control this variation and produce a more stable and reliable estimator. We then formulate a conjugate Bayesian hierarchical model, and use this new estimator to propose a Bayesian hypothesis test to detect differentially expressed genes in RNA-Seq data. We use numerical studies to show that our much simpler approach is competitive with other negative binomial based procedures, and we use a real data set to illustrate the implementation and flexibility of the procedure.


Sign in / Sign up

Export Citation Format

Share Document