Impact of Magnitude of Zero Inflation of Covariates on Statistical Inference and Model Selection

2021 ◽  
Vol 10 (2) ◽  
pp. 287-292
2009 ◽  
Vol 26 (2) ◽  
pp. 217-236 ◽  
Author(s):  
Richard Berk ◽  
Lawrence Brown ◽  
Linda Zhao

2020 ◽  
Vol 21 (1) ◽  
Author(s):  
Kwangbom Choi ◽  
Yang Chen ◽  
Daniel A. Skelly ◽  
Gary A. Churchill

Abstract Background Single-cell RNA sequencing is a powerful tool for characterizing cellular heterogeneity in gene expression. However, high variability and a large number of zero counts present challenges for analysis and interpretation. There is substantial controversy over the origins and proper treatment of zeros and no consensus on whether zero-inflated count distributions are necessary or even useful. While some studies assume the existence of zero inflation due to technical artifacts and attempt to impute the missing information, other recent studies argue that there is no zero inflation in scRNA-seq data. Results We apply a Bayesian model selection approach to unambiguously demonstrate zero inflation in multiple biologically realistic scRNA-seq datasets. We show that the primary causes of zero inflation are not technical but rather biological in nature. We also demonstrate that parameter estimates from the zero-inflated negative binomial distribution are an unreliable indicator of zero inflation. Conclusions Despite the existence of zero inflation in scRNA-seq counts, we recommend the generalized linear model with negative binomial count distribution, not zero-inflated, as a suitable reference model for scRNA-seq analysis.


Author(s):  
Kwangbom Choi ◽  
Yang Chen ◽  
Daniel A. Skelly ◽  
Gary A. Churchill

AbstractSingle-cell RNA sequencing is a powerful tool for characterizing cellular heterogeneity in gene expression. However, high variability and a large number of zero counts present challenges for analysis and interpretation. There is substantial controversy over the origins and proper treatment of zeros and no consensus on whether zero-inflated count distributions are necessary or even useful. While some studies assume the existence of zero inflation due to technical artifacts and attempt to impute the missing information, other recent studies of argue that there is no zero inflation in scRNA-Seq data. We apply a Bayesian model selection approach to unambiguously demonstrate zero inflation in multiple biologically realistic scRNA-Seq datasets. We show that the primary causes of zero inflation are not technical but rather biological in nature. We also demonstrate that parameter estimates from the zero-inflated negative binomial distribution are an unreliable indicator of zero inflation. Despite the existence of zero inflation of scRNA-Seq counts, we recommend the generalized linear model with negative binomial count distribution (not zero-inflated) as a suitable reference model for scRNA-Seq analysis.


1985 ◽  
Vol 18 (1) ◽  
pp. 39-44 ◽  
Author(s):  
Jaime Marquez ◽  
Janice Shack-Marquez ◽  
William L. Wascher

2014 ◽  
Vol 37 (2) ◽  
pp. 141-143
Author(s):  
A. Martínez-Abrain ◽  
◽  
D. Conesa ◽  
A. Forte ◽  
◽  
...  

We approach here the handling of previous information when performing statistical inference in ecology, both when dealing with model specification and selection, and when dealing with parameter estimation. We compare the perspectives of this problem from the frequentist and Bayesian schools, including objective and subjective Bayesians. We show that the issue of making use of previous information and making a priori decisions is not only a reality for Bayesians but also for frequentists. However, the latter tend to overlook this because of the common difficulty of having previous information available on the magnitude of the effect that is thought to be biologically relevant. This prior information should be fed into a priori power tests when looking for the necessary sample sizes to couple statistical and biological significances. Ecologists should make a greater effort to make use of available prior information because this is their most legitimate contribution to the inferential process. Parameter estimation and model selection would benefit if this was done, allowing a more reliable accumulation of knowledge, and hence progress, in the biological sciences.


2014 ◽  
Vol 30 (1) ◽  
pp. 123-146 ◽  
Author(s):  
James O. Chipperfield

Abstract Large amounts of microdata are collected by data custodians in the form of censuses and administrative records. Often, data custodians will collect different information on the same individual. Many important questions can be answered by linking microdata collected by different data custodians. For this reason, there is very strong demand from analysts, within government, business, and universities, for linked microdata. However, many data custodians are legally obliged to ensure the risk of disclosing information about a person or organisation is acceptably low. Different authors have considered the problem of how to facilitate reliable statistical inference from analysis of linked microdata while ensuring that the risk of disclosure is acceptably low. This article considers the problem from the perspective of an Integrating Authority that, by definition, is trusted to link the microdata and to facilitate analysts’ access to the linked microdata via a remote server, which allows analysts to fit models and view the statistical output without being able to observe the underlying linked microdata. One disclosure risk that must be managed by an Integrating Authority is that one data custodian may use the microdata it supplied to the Integrating Authority and statistical output released from the remote server to disclose information about a person or organisation that was supplied by the other data custodian. This article considers analysis of only binary variables. The utility and disclosure risk of the proposed method are investigated both in a simulation and using a real example. This article shows that some popular protections against disclosure (dropping records, rounding regression coefficients or imposing restrictions on model selection) can be ineffective in the above setting.


Sign in / Sign up

Export Citation Format

Share Document