Variance component testing for identifying differentially expressed genes in RNA-seq data

PeerJ ◽

10.7717/peerj.3797 ◽

2017 ◽

Vol 5 ◽

pp. e3797 ◽

Cited By ~ 2

Author(s):

Sheng Yang ◽

Fang Shao ◽

Weiwei Duan ◽

Yang Zhao ◽

Feng Chen

Keyword(s):

Variance Component ◽

Negative Binomial ◽

Null Distribution ◽

Real Data ◽

Differentially Expressed ◽

Expression Data ◽

Rna Seq ◽

Component Testing ◽

Optimal Power ◽

Variance Component Testing

RNA sequencing (RNA-Seq) enables the measurement and comparison of gene expression with isoform-level quantification. Differences in the effect of each isoform may make traditional methods, which aggregate isoforms, ineffective. Here, we introduce a variance component-based test that can jointly test multiple isoforms of one gene to identify differentially expressed (DE) genes, especially those with isoforms that have differential effects. We model isoform-level expression data from RNA-Seq using a negative binomial distribution and consider the baseline abundance of isoforms and their effects as two random terms. Our approach tests the global null hypothesis of no difference in any of the isoforms. The null distribution of the derived score statistic is investigated using empirical and theoretical methods. The results of simulations suggest that the performance of the proposed set test is superior to that of traditional algorithms and almost reaches optimal power when the variance of covariates is large. This method is also applied to analyze real data. Our algorithm, as a supplement to traditional algorithms, is superior at selecting DE genes with sparse or opposite effects for isoforms.

Download Full-text

A two-step integrated approach to detect differentially expressed genes in RNA-Seq data

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720016500347 ◽

2016 ◽

Vol 14 (06) ◽

pp. 1650034 ◽

Cited By ~ 1

Author(s):

Naim Al Mahi ◽

Munni Begum

Keyword(s):

Simulation Study ◽

Negative Binomial ◽

Real Data ◽

Integrated Approach ◽

Differentially Expressed ◽

Data Sets ◽

Rna Seq ◽

Software Packages ◽

Treatment Conditions ◽

Integrated Approaches

One of the primary objectives of ribonucleic acid (RNA) sequencing or RNA-Seq experiment is to identify differentially expressed (DE) genes in two or more treatment conditions. It is a common practice to assume that all read counts from RNA-Seq data follow overdispersed (OD) Poisson or negative binomial (NB) distribution, which is sometimes misleading because within each condition, some genes may have unvarying transcription levels with no overdispersion. In such a case, it is more appropriate and logical to consider two sets of genes: OD and non-overdispersed (NOD). We propose a new two-step integrated approach to distinguish DE genes in RNA-Seq data using standard Poisson and NB models for NOD and OD genes, respectively. This is an integrated approach because this method can be merged with any other NB-based methods for detecting DE genes. We design a simulation study and analyze two real RNA-Seq data to evaluate the proposed strategy. We compare the performance of this new method combined with the three [Formula: see text]-software packages namely edgeR, DESeq2, and DSS with their default settings. For both the simulated and real data sets, integrated approaches perform better or at least equally well compared to the regular methods embedded in these [Formula: see text]-packages.

Download Full-text

qtQDA: quantile transformed quadratic discriminant analysis for high-dimensional RNA-seq data

PeerJ ◽

10.7717/peerj.8260 ◽

2019 ◽

Vol 7 ◽

pp. e8260 ◽

Cited By ~ 1

Author(s):

Necla Koçhan ◽

G. Yazgi Tutuncu ◽

Gordon K. Smyth ◽

Luke C. Gandolfo ◽

Göknur Giner

Keyword(s):

Discriminant Analysis ◽

Negative Binomial ◽

Real Data ◽

R Package ◽

Modern Medicine ◽

Data Sets ◽

Quadratic Discriminant Analysis ◽

Expression Data ◽

Rna Seq ◽

New Classification

Classification on the basis of gene expression data derived from RNA-seq promises to become an important part of modern medicine. We propose a new classification method based on a model where the data is marginally negative binomial but dependent, thereby incorporating the dependence known to be present between measurements from different genes. The method, called qtQDA, works by first performing a quantile transformation (qt) then applying Gaussian quadratic discriminant analysis (QDA) using regularized covariance matrix estimates. We show that qtQDA has excellent performance when applied to real data sets and has advantages over some existing approaches. An R package implementing the method is also available on https://github.com/goknurginer/qtQDA.

Download Full-text

qtQDA: quantile transformed quadratic discriminant analysis for high-dimensional RNA-seq data

10.1101/751370 ◽

2019 ◽

Cited By ~ 1

Author(s):

Necla Koçhan ◽

Gözde Y. Tütüncü ◽

Gordon K. Smyth ◽

Luke C. Gandolfo ◽

Göknur Giner

Keyword(s):

Discriminant Analysis ◽

Negative Binomial ◽

Real Data ◽

R Package ◽

Modern Medicine ◽

Data Sets ◽

Quadratic Discriminant Analysis ◽

Expression Data ◽

Rna Seq ◽

New Classification

AbstractClassification on the basis of gene expression data derived from RNA-seq promises to become an important part of modern medicine. We propose a new classification method based on a model where the data is marginally negative binomial but dependent, thereby incorporating the dependence known to be present between measurements from different genes. The method, called qtQDA, works by first performing a quantile transformation (qt) then applying Gaussian Quadratic Discriminant Analysis (QDA) using regularized covariance matrix estimates. We show that qtQDA has excellent performance when applied to real data sets and has advantages over some existing approaches. An R package implementing the method is also available.

Download Full-text

Variance Component Testing in Generalized Linear Mixed Models for Longitudinal/Clustered Data and other Related Topics

Random Effect and Latent Variable Model Selection - Lecture Notes in Statistics ◽

10.1007/978-0-387-76721-5_2 ◽

2008 ◽

pp. 19-36 ◽

Cited By ~ 13

Author(s):

Daowen Zhang ◽

Xihong Lin

Keyword(s):

Mixed Models ◽

Variance Component ◽

Clustered Data ◽

Generalized Linear Mixed Models ◽

Linear Mixed Models ◽

Component Testing ◽

Variance Component Testing

Download Full-text

Variance Component Testing in Unbalanced Nested Designs

Journal of the American Statistical Association ◽

10.1080/01621459.1974.10480202 ◽

1974 ◽

Vol 69 (347) ◽

pp. 765-771 ◽

Cited By ~ 10

Author(s):

W. B. Cummings ◽

D. W. Gaylor

Keyword(s):

Variance Component ◽

Component Testing ◽

Nested Designs ◽

Variance Component Testing

Download Full-text

Variance component testing in semiparametric mixed models

Journal of Multivariate Analysis ◽

10.1016/j.jmva.2004.04.012 ◽

2004 ◽

Vol 91 (1) ◽

pp. 107-118 ◽

Cited By ~ 13

Author(s):

Zhongyi Zhu ◽

Wing K Fung

Keyword(s):

Mixed Models ◽

Variance Component ◽

Component Testing ◽

Variance Component Testing

Download Full-text

Variance component testing in generalised linear models with random effects

Biometrika ◽

10.1093/biomet/84.2.309 ◽

1997 ◽

Vol 84 (2) ◽

pp. 309-326 ◽

Cited By ~ 205

Author(s):

X Lin

Keyword(s):

Random Effects ◽

Variance Component ◽

Linear Models ◽

Generalised Linear Models ◽

Component Testing ◽

Variance Component Testing

Download Full-text

Excess False Positive Rates in Methods for Differential Gene Expression Analysis using RNA-Seq Data

10.1101/020784 ◽

2015 ◽

Cited By ~ 7

Author(s):

David M Rocke ◽

Luyao Ruan ◽

Yilun Zhang ◽

J. Jared Gossett ◽

Blythe Durbin-Johnson ◽

...

Keyword(s):

Linear Model ◽

False Positive ◽

Negative Binomial ◽

False Positive Rate ◽

Real Data ◽

False Positives ◽

P Value ◽

Data Sets ◽

Rna Seq ◽

Positive Rate

Motivation: An important property of a valid method for testing for differential expression is that the false positive rate should at least roughly correspond to the p-value cutoff, so that if 10,000 genes are tested at a p-value cutoff of 10−4, and if all the null hypotheses are true, then there should be only about 1 gene declared to be significantly differentially expressed. We tested this by resampling from existing RNA-Seq data sets and also by matched negative binomial simulations. Results: Methods we examined, which rely strongly on a negative binomial model, such as edgeR, DESeq, and DESeq2, show large numbers of false positives in both the resampled real-data case and in the simulated negative binomial case. This also occurs with a negative binomial generalized linear model function in R. Methods that use only the variance function, such as limma-voom, do not show excessive false positives, as is also the case with a variance stabilizing transformation followed by linear model analysis with limma. The excess false positives are likely caused by apparently small biases in estimation of negative binomial dispersion and, perhaps surprisingly, occur mostly when the mean and/or the dis-persion is high, rather than for low-count genes.

Download Full-text

RNAtor: an Android-based application for biologists to plan RNA sequencing experiments

F1000Research ◽

10.12688/f1000research.11982.2 ◽

2017 ◽

Vol 6 ◽

pp. 997

Author(s):

Shruti Kane ◽

Himanshu Garg ◽

Neeraja M. Krishnan ◽

Aditya Singh ◽

Binay Panda

Keyword(s):

Rna Sequencing ◽

Mobile Applications ◽

Mobile Application ◽

Splice Variants ◽

Real Data ◽

Differentially Expressed ◽

Rna Seq ◽

Sample Analysis ◽

Allele Expression ◽

Novel Transcripts

RNA sequencing (RNA-seq) is a powerful technology that allows one to assess the RNA levels in a sample. Analysis of these levels can help in identifying novel transcripts (coding, non-coding and splice variants), understanding transcript structures, and estimating gene/allele expression. Biologists face specific challenges while designing RNA-seq experiments. The nature of these challenges lies in determining the total number of sequenced reads and technical replicates required for detecting marginally differentially expressed transcripts. Despite previous attempts to address these challenges, easily-accessible and biologist-friendly mobile applications do not exist. Thus, we developed RNAtor, a mobile application for Android platforms, to aid biologists in correctly designing their RNA-seq experiments. The recommendations from RNAtor are based on simulations and real data.

Download Full-text

Marginal likelihood estimation of negative binomial parameters with applications to RNA-seq data

Biostatistics ◽

10.1093/biostatistics/kxx006 ◽

2017 ◽

Vol 18 (4) ◽

pp. 637-650 ◽

Cited By ~ 5

Author(s):

Luis León-Novelo ◽

Claudio Fuentes ◽

Sarah Emerson

Keyword(s):

Negative Binomial ◽

Marginal Likelihood ◽

Hypothesis Test ◽

Likelihood Estimation ◽

Real Data ◽

Rna Seq ◽

Data Set ◽

Bayesian Hierarchical ◽

Proposed Model ◽

Bayesian Hypothesis Test

SUMMARY RNA-Seq data characteristically exhibits large variances, which need to be appropriately accounted for in any proposed model. We first explore the effects of this variability on the maximum likelihood estimator (MLE) of the dispersion parameter of the negative binomial distribution, and propose instead to use an estimator obtained via maximization of the marginal likelihood in a conjugate Bayesian framework. We show, via simulation studies, that the marginal MLE can better control this variation and produce a more stable and reliable estimator. We then formulate a conjugate Bayesian hierarchical model, and use this new estimator to propose a Bayesian hypothesis test to detect differentially expressed genes in RNA-Seq data. We use numerical studies to show that our much simpler approach is competitive with other negative binomial based procedures, and we use a real data set to illustrate the implementation and flexibility of the procedure.

Download Full-text