Classification of RNA-Seq Data via Gaussian Copulas

Mapping Intimacies ◽

10.1101/116046 ◽

2017 ◽

Author(s):

Qingyang Zhang

Keyword(s):

Negative Binomial Distribution ◽

Binomial Distribution ◽

Negative Binomial ◽

Natural Extension ◽

Real Data ◽

Discrete Distributions ◽

Gaussian Copula ◽

Rna Seq ◽

Independence Assumption ◽

Discriminant Score

AbstractRNA-sequencing (RNA-Seq) has become a preferred option to quantify gene expression, because it is more accurate and reliable than microarrays. In RNA-Seq experiments, the expression level of a gene is measured by the count of short reads that are mapped to the gene region. Although some normal-based statistical methods may also be applied to log-transformed read counts, they are not ideal for directly modeling RNA-Seq data. Two discrete distributions, Poisson distribution and negative binomial distribution, have been commonly used in the literature to model RNA-Seq data, where the latter is a natural extension of the former with allowance of overdispersion. Due to the technical difficulty in modeling correlated counts, most existing classifiers based on discrete distributions assume that genes are independent of each other. However, as we show in this paper, the independence assumption may cause non-ignorable bias in estimating the discriminant score, making the classification inaccurate. To this end, we drop the independence assumption and explicitly model the dependence between genes using Gaussian copula. We apply a Bayesian approach to estimate covariance matrix and the overdispersion parameter in negative binomial distribution. Both synthetic data and real data are used to demonstrate the advantages of our model.

Download Full-text

Multivariate Doubly-Inflated Negative Binomial Distribution Using Gaussian Copula

STEAM-H: Science, Technology, Engineering, Agriculture, Mathematics & Health - Modern Statistical Methods for Spatial and Multivariate Data ◽

10.1007/978-3-030-11431-2_8 ◽

2019 ◽

pp. 147-161

Author(s):

Joseph Mathews ◽

Sumen Sen ◽

Ishapathik Das

Keyword(s):

Negative Binomial Distribution ◽

Binomial Distribution ◽

Negative Binomial ◽

Gaussian Copula

Download Full-text

How well do RNA-Seq differential gene expression tools perform in a complex eukaryote? A case study in Arabidopsis thaliana

Bioinformatics ◽

10.1093/bioinformatics/btz089 ◽

2019 ◽

Vol 35 (18) ◽

pp. 3372-3377 ◽

Cited By ~ 2

Author(s):

Kimon Froussios ◽

Nick J Schurch ◽

Katarzyna Mackinnon ◽

Marek Gierliński ◽

Céline Duc ◽

...

Keyword(s):

Gene Expression ◽

Arabidopsis Thaliana ◽

Normal Distribution ◽

Differential Gene Expression ◽

Negative Binomial Distribution ◽

Binomial Distribution ◽

Negative Binomial ◽

Supplementary Information ◽

Rna Seq ◽

Differential Gene

Abstract Motivation RNA-seq experiments are usually carried out in three or fewer replicates. In order to work well with so few samples, differential gene expression (DGE) tools typically assume the form of the underlying gene expression distribution. In this paper, the statistical properties of gene expression from RNA-seq are investigated in the complex eukaryote, Arabidopsis thaliana, extending and generalizing the results of previous work in the simple eukaryote Saccharomyces cerevisiae. Results We show that, consistent with the results in S.cerevisiae, more gene expression measurements in A.thaliana are consistent with being drawn from an underlying negative binomial distribution than either a log-normal distribution or a normal distribution, and that the size and complexity of the A.thaliana transcriptome does not influence the false positive rate performance of nine widely used DGE tools tested here. We therefore recommend the use of DGE tools that are based on the negative binomial distribution. Availability and implementation The raw data for the 17 WT Arabidopsis thaliana datasets is available from the European Nucleotide Archive (E-MTAB-5446). The processed and aligned data can be visualized in context using IGB (Freese et al., 2016), or downloaded directly, using our publicly available IGB quickload server at https://compbio.lifesci.dundee.ac.uk/arabidopsisQuickload/public_quickload/ under ‘RNAseq>Froussios2019’. All scripts and commands are available from github at https://github.com/bartongroup/KF_arabidopsis-GRNA. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

The Lindley negative-binomial distribution: Properties, estimation and applications to lifetime data

Mathematica Slovaca ◽

10.1515/ms-2017-0404 ◽

2020 ◽

Vol 70 (4) ◽

pp. 917-934

Author(s):

Muhammad Mansoor ◽

Muhammad Hussain Tahir ◽

Gauss M. Cordeiro ◽

Sajid Ali ◽

Ayman Alzaatreh

Keyword(s):

Negative Binomial Distribution ◽

Binomial Distribution ◽

Hazard Rate ◽

Negative Binomial ◽

Real Data ◽

Moment Generating Function ◽

Estimation Methods ◽

Model Parameters ◽

Proposed Model ◽

Rate Functions

AbstractA generalization of the Lindley distribution namely, Lindley negative-binomial distribution, is introduced. The Lindley and the exponentiated Lindley distributions are considered as sub-models of the proposed distribution. The proposed model has flexible density and hazard rate functions. The density function can be decreasing, right-skewed, left-skewed and approximately symmetric. The hazard rate function possesses various shapes including increasing, decreasing and bathtub. Furthermore, the survival and hazard rate functions have closed form representations which make this model tractable for censored data analysis. Some general properties of the proposed model are studied such as ordinary and incomplete moments, moment generating function, mean deviations, Lorenz and Bonferroni curve. The maximum likelihood and the Bayesian estimation methods are utilized to estimate the model parameters. In addition, a small simulation study is conducted in order to evaluate the performance of the estimation methods. Two real data sets are used to illustrate the applicability of the proposed model.

Download Full-text

Negative binomial improved second degree lindley distribution and its application

Advances in Mathematics: Scientific Journal ◽

10.37418/amsj.9.2.5 ◽

2020 ◽

pp. 569-581

Author(s):

R. Ashly ◽

C. S. Rajitha

Keyword(s):

Negative Binomial Distribution ◽

Binomial Distribution ◽

Negative Binomial ◽

Estimation Method ◽

Likelihood Estimation ◽

Real Data ◽

Data Set ◽

Factorial Moments ◽

Lindley Distribution ◽

Second Degree

The objective of this paper is to introduce a new two parameter mixed negative binomial distribution, namely negative binomial-improved second degree Lindley(NB-ISL) distribution. This distribution is obtained by mixing the negative binomial distribution with the improved second degree Lindley distribution. Many mixed distributions have been used in the literature for modeling the over dispersed count data, which provide a better fit compared to the Poisson and negative binomial distribution. In addition, we present the basic statistical properties of the new distribution such as factorial moments, mean and variance and the behavior of mean, variance and coefficient of variation are also discussed. Parameter estimation is implemented by using maximum likelihood estimation method. The performance of the NB-ISL distribution is shown in practice by applying it on real data set and compare it with some well-known count distributions. The result shows that the negative binomial-improved second degree Lindley distribution provides a better fit compared to Poisson, negative binomial and negative binomial-Lindley distributions.

Download Full-text

A Novel Bayesian Outlier Score Based on the Negative Binomial Distribution for Detecting Aberrantly Expressed Genes in RNA-Seq Gene Expression Count Data

IEEE Access ◽

10.1109/access.2021.3082311 ◽

2021 ◽

pp. 1-1

Author(s):

Edin Salkovic ◽

Halima Bensmail

Keyword(s):

Gene Expression ◽

Negative Binomial Distribution ◽

Count Data ◽

Binomial Distribution ◽

Negative Binomial ◽

Rna Seq

Download Full-text

How well do RNA-Seq differential gene expression tools perform in a eukaryote with a complex transcriptome?

10.1101/090753 ◽

2016 ◽

Cited By ~ 4

Author(s):

Kimon Froussios ◽

Nick J. Schurch ◽

Katarzyna Mackinnon ◽

Marek Gierliński ◽

Céline Duc ◽

...

Keyword(s):

Gene Expression ◽

Differential Gene Expression ◽

Negative Binomial Distribution ◽

Binomial Distribution ◽

Negative Binomial ◽

False Positive Rate ◽

Rna Seq ◽

Underlying Distribution ◽

Differential Gene ◽

Log Normal

AbstractRNA-seq experiments are usually carried out in three or fewer replicates. In order to work well with so few samples, Differential Gene Expression (DGE) tools typically assume the form of the underlying distribution of gene expression. A recent highly replicated study revealed that RNA-seq gene expression measurements in yeast are best represented as being drawn from an underlying negative binomial distribution. In this paper, the statistical properties of gene expression in the higher eukaryote Arabidopsis thaliana are shown to be essentially identical to those from yeast despite the large increase in the size and complexity of the transcriptome: Gene expression measurements from this model plant species are consistent with being drawn from an underlying negative binomial or log-normal distribution and the false positive rate performance of nine widely used DGE tools is not strongly affected by the additional size and complexity of the A. thaliana transcriptome. For RNA-seq data, we therefore recommend the use of DGE tools that are based on the negative binomial distribution.

Download Full-text

KAJIAN SIMULASI OVERDISPERSI PADA REGRESI POISSON DAN BINOMIAL NEGATIF TERBOBOTI GEOGRAFIS UNTUK DATA BALITA GIZI BURUK

Indonesian Journal of Statistics and Its Applications ◽

10.29244/ijsa.v4i3.684 ◽

2020 ◽

Vol 4 (3) ◽

pp. 484-497

Author(s):

Puput Cahya Ambarwati ◽

Indahwati Indahwati ◽

Muhammad Nur Aidi

Keyword(s):

Poisson Distribution ◽

Spatial Data ◽

Negative Binomial Distribution ◽

Binomial Distribution ◽

Negative Binomial ◽

Negative Binomial Regression ◽

Real Data ◽

P Value ◽

Simulation Data ◽

Response Variable

Geographic weighted regression (GWR) is one of the regression methods for spatial data. GWR with the response variable following the poisson distribution can use the geographic weighted poisson regression (GWPR). GWPR often does not complete the assumption of dispersion. The classic approach commonly used to overcome overdispersion is related to poisson distribution, which is the approach obtained from poisson and gamma distribution which is similar to negative binomial distribution function. GWR for the response variable following the negative binomial distribution can use the geographical weighted negative binomial regression (GWNBR). The data used in this study are simulation data and real data. The results of the simulation data are the tolerance limits that are still precisely modeled with GWPR are overdispersion approaching 1 based on significant amount and average p-value.. The results of research from real data, the GWNBR is the best model for overdispersion cases in malnourished children in East Java Province in 2017 compared to the GWPR based on comparison of the values of AIC.

Download Full-text

The Inverse Burr Negative Binomial Distribution with Application to Real Data

Journal of Statistics Applications & Probability ◽

10.18576/jsap/050105 ◽

2016 ◽

Vol 5 (1) ◽

pp. 53-65 ◽

Cited By ~ 2

Author(s):

Abdullahi Yusuf ◽

Badamasi Bashir Mikail ◽

Aliyu Isah Aliyu ◽

Abdurrahaman L. Sulaiman

Keyword(s):

Negative Binomial Distribution ◽

Binomial Distribution ◽

Negative Binomial ◽

Real Data

Download Full-text

Objective Bayesian analysis of the 2 x 2 contingency table and the negative binomial distribution

10.32469/10355/66783 ◽

2018 ◽

Author(s):

◽

John Christian Snyder

Keyword(s):

Bayesian Analysis ◽

Negative Binomial Distribution ◽

Binomial Distribution ◽

Odds Ratio ◽

Negative Binomial ◽

Real Data ◽

R Package ◽

Small Sample ◽

Reference Prior ◽

Categorical Data Analysis

In Bayesian analysis, the â€œobjectiveâ€ Bayesian approach seeks to select a prior distribution not by using (often subjective) scientific belief or by mathematical convenience, but rather by deriving it under a pre-specified criteria. This approach takes the decision of prior selection out of the hands of the researcher. Ideally, for a given data model, we would like to have a prior which represents a "neutral" prior belief in the phenomenon we are studying. In categorical data analysis, the odds ratio is one of several approaches to quantify how strongly the presence or absence of one property is associated with the presence or absence of another property. In this project, we present a Reference prior for the odds ratio of an unrestricted 2 x 2 table. Posterior simulation can be conducted without MCMC and is implemented on a GPU via the CUDA extensions for C. Simulation results indicate that the proposed approach to this problem is far superior to the widely used Frequentist approaches that dominate this area. Real data examples also typically yield much more sensible results, especially for small sample sizes or for tables that contain zeros. An R package is also presented to allow for easy implementation of this methodology. Next, we develop an approximate reference prior for the negative binomial distribution, applying this methodology to a continuous parameterization often used for modeling over-dispersed count data as well as the typical discrete case. Results indicate that the developed prior equals the performance of the MLE in estimating the mean of the distribution but is far superior when estimating the dispersion parameter.

Download Full-text

Small Sample Properties of the Pareto/Negative Binomial Distribution Model

Marketing ZFP ◽

10.15358/0344-1369-2010-jrm-1-39 ◽

2010 ◽

Vol 32 (JRM 1) ◽

pp. 39-50

Author(s):

Daniel Hoppe ◽

Udo Wagner

Keyword(s):

Negative Binomial Distribution ◽

Binomial Distribution ◽

Negative Binomial ◽

Small Sample ◽

Distribution Model ◽

Small Sample Properties

Download Full-text