On Difference Equations, Probability Models and the “Generalized Event Count” Distribution

Political Analysis ◽

10.1093/pan/6.1.175 ◽

1996 ◽

Vol 6 ◽

pp. 175-212 ◽

Cited By ~ 1

Author(s):

Timothy W. Amato

Keyword(s):

Difference Equations ◽

Count Data ◽

Negative Binomial ◽

Probability Model ◽

Dispersion Parameter ◽

Probability Models ◽

Statistical Structure ◽

Count Models ◽

Count Distribution ◽

The Common

In this article, the mathematical and probabilistic foundations of Gary King's “generalized event count” (GEC) model for dealing with unequally dispersed event count data are explored. It is shown that the GEC model is a probability model that joins together the binomial, negative binomial, and Poisson distributions. Some aspects of the GEC's reparameterization are described and extended and it is shown how different reparameterizations lead to different interpretations of the dispersion parameter. The common mathematical and statistical structure of “unequally dispersed” event count models as models that require estimation of the “number of trials” parameter along with the “probability” component is derived. Some questions pertaining to estimation of this class of models are raised for future discussion.

Download Full-text

A Stochastic Condensation Mechanism for Inducing Underdispersion in Count Models

10.20944/preprints202103.0570.v1 ◽

2021 ◽

Author(s):

Chenangnon Frédéric Tovissodé ◽

Romain Glele Kakai

Keyword(s):

Negative Binomial Distribution ◽

Count Data ◽

Negative Binomial ◽

Count Models ◽

Original Variable ◽

Binomial Distributions ◽

Original Distribution ◽

Count Distribution ◽

Special Cases ◽

Stochastic Mechanism

It is quite easy to stochastically distort an original count variable to obtain a new count variable with relatively more variability than in the original variable. Many popular overdispersion models (variance greater than mean) can indeed be obtained by mixtures, compounding or randomlystopped sums. There is no analogous stochastic mechanism for the construction of underdispersed count variables (variance less than mean), starting from an original count distribution of interest. This work proposes a generic method to stochastically distort an original count variable to obtain a new count variable with relatively less variability than in the original variable. The proposed mechanism, termed condensation, attracts probability masses from the quantiles in the tails of the original distribution and redirect them toward quantiles around the expected value. If the original distribution can be simulated, then the simulation of variates from a condensed distribution is straightforward. Moreover, condensed distributions have a simple mean-parametrization, a characteristic useful in a count regression context. An application to the negative binomial distribution resulted in a distribution allowing under, equi and overdispersion. In addition to graphical insights, fields of applications of special cases of condensed Poisson and condensed negative binomial distributions were pointed out as an indication of the potential of condensation for a flexible analysis of count data

Download Full-text

A comparison of zero-inflated and hurdle models for modeling zero-inflated count data

Journal of Statistical Distributions and Applications ◽

10.1186/s40488-021-00121-4 ◽

2021 ◽

Vol 8 (1) ◽

Author(s):

Cindy Xin Feng

Keyword(s):

Health Services ◽

Count Data ◽

Goodness Of Fit ◽

Negative Binomial ◽

Simulation Studies ◽

Final Choice ◽

Hurdle Models ◽

Count Distribution ◽

Careful Assessment

AbstractCounts data with excessive zeros are frequently encountered in practice. For example, the number of health services visits often includes many zeros representing the patients with no utilization during a follow-up time. A common feature of this type of data is that the count measure tends to have excessive zero beyond a common count distribution can accommodate, such as Poisson or negative binomial. Zero-inflated or hurdle models are often used to fit such data. Despite the increasing popularity of ZI and hurdle models, there is still a lack of investigation of the fundamental differences between these two types of models. In this article, we reviewed the zero-inflated and hurdle models and highlighted their differences in terms of their data generating processes. We also conducted simulation studies to evaluate the performances of both types of models. The final choice of regression model should be made after a careful assessment of goodness of fit and should be tailored to a particular data in question.

Download Full-text

Correcting for Non-Sum to 1 Estimated Probabilities in Applications of Discrete Probability Models to Count Data

International Journal of Statistics and Probability ◽

10.5539/ijsp.v6n5p119 ◽

2017 ◽

Vol 6 (5) ◽

pp. 119

Author(s):

Bayo H. Lawal

Keyword(s):

Sample Size ◽

Count Data ◽

Negative Binomial ◽

Frequency Count ◽

Frequency Data ◽

Probability Models ◽

Common Procedure ◽

Discrete Probability ◽

Excess Zeros ◽

Expected Values

In this paper, we examine some often ignored or assumed problems relating with fitting probability models to count data either exhibiting over, equi, or under dispersion. Of particular concern are last category truncated data, where most often, expected values in this last category are collapsed together so that the sum of the expected values sum to the sample size in the data. That is, so that $\displaystyle \sum_{i=0}^{k} \hat{m}_i=n$, the sample size. We shall for illustrative purposes in this paper, consider the following distributions: the negative binomial (NB), the Inverse trinomial (IT), the hyper-Poisson (HP), the Quasi-negative binomial (QNBD), the extended com-Poisson distribution (ECOMP) as well as the negative binomial-exponential distribution (NBGE).Though, we have restricted our discussion to these six distributions, other distributions may also be employed but the patterns are always the same, that is, the sum of the estimated probabilities does not equal 1.00 and consequently, the sum of the expected values is always less or equal (Poisson case only) the sample size in the observed data. We propose a common procedure to rectify this problem for both right truncated or non-truncated frequency count data exhibiting either excess zeros or regular frequency data.

Download Full-text

Discrimination between Some Over Dispersed Count Distributions

ASM Science Journal ◽

10.32802/asmscj.2020.503 ◽

2021 ◽

Vol 14 ◽

pp. 1-8

Author(s):

Yook-Ngor Phang ◽

Seng-Huat Ong ◽

Yeh-Ching Low

Keyword(s):

Count Data ◽

Negative Binomial ◽

Likelihood Ratio Statistic ◽

Variance Function ◽

Correct Selection ◽

Inverse Gaussian ◽

Practical Applications ◽

Generalized Poisson ◽

Count Distribution ◽

Healthcare Insurance

The Poisson inverse Gaussian and generalized Poisson distributions are widely used in modelling overdispersed count data which are commonly found in healthcare, insurance, engineering, econometric and ecology. The inverse trinomial distribution is a relatively new count distribution arising from a one-dimensional random walk model (Shimizu & Yanagimoto, 1991). The Poisson inverse Gaussian distribution is a popular count model that has been proposed as an alternative to the negative binomial distribution. The inverse trinomial and generalized Poisson models possess a common characteristic of having a cubic variance function, while the Poisson inverse Gaussian has a quadratic variance function. The nature of the variance function seems to be an important property in modelling overdispersed count data. Hence it is of interest to be able to select among the three models in practical applications. This paper considers discrimination of three models based on the likelihood ratio statistic and computes via Monte Carlo simulation the probability of correct selection.

Download Full-text

A Method to Characterize Climate, Earth or Environmental Vector Random Processes

10.21203/rs.3.rs-1026259/v1 ◽

2021 ◽

Author(s):

Manuel Cobos Budia ◽

Pedro Otiñar Morillas ◽

Pedro Magaña Redondo ◽

Asunción Baquerizo Azofra

Keyword(s):

Time Series ◽

Probability Model ◽

Likelihood Method ◽

Model Parameters ◽

Probability Models ◽

Wind Rose ◽

Orthogonal Functions ◽

Generalized Fourier Series ◽

The Common ◽

Original Time

Abstract We propose a methodology to characterize a multivariate non-stationary vector random process that can be used for simulating random realizations that keep the probabilistic behavior of the original time series. The marginal probability distribution of each component process is assumed to be a piecewise function deﬁned by several weighted parametric probability models. The weights are obtained analytically by ensuring that the probability density function is well deﬁned and that it is continuous at the common endpoints. The probability model is assumed to vary periodically in time over a predeﬁned time period by deﬁning the model parameters and the common endpoints as truncated generalized Fourier series. The coeﬃcients of the expansions are obtained with the maximum likelihood method. Three diﬀerent types of sets of orthogonal functions are tested. The method is applied to three time series with diﬀerent particularities. Firstly, it is shown its good behavior to capture the highly variable freshwater discharges at a dam located in a semiarid zone in Andalucía (Spain) which is inﬂuenced not only by the climate variability but also by management decisions. Secondly, for the Wolf sunspot number time series, the Schwabe cycle and time variations close to the 7.5 and 17 years are analyzed along a 22-year cycle. Finally, the method is applied to a bivariate (velocity and direction) wind time series observed at a location of the Atlantic Ocean. For this case, the analysis, that was combined with a vectorial autoregresive model, focus on the assessment of the goodness of the methodology to replicate the statistical features of the original series. In particular, it is found that it reproduces the marginal and joint distributions, the wind rose, and the duration of sojourns above given thresholds.

Download Full-text

Microbial Sampling: Is It Better to Sample Many Times or Use Large Samples?

Water Science & Technology ◽

10.2166/wst.1993.0314 ◽

1993 ◽

Vol 27 (3-4) ◽

pp. 19-25 ◽

Cited By ~ 10

Author(s):

Charles N. Haas

Keyword(s):

Small Volume ◽

Negative Binomial ◽

Dispersion Parameter ◽

Microbial Quality ◽

Large Samples ◽

Repeated Sampling ◽

Repeated Samples ◽

Microbial Sampling

Repeated sampling of a water (raw, Ssished, recreational) is often used to assess microbial quality. Microbial distributions have often been found to be negative binomial distributed in such repeated samples. Under these conditions, it is shown that it is better to use a large number of small volume samples than vice versa, providing that the negative binomial dispersion parameter remains unaffected by volume. Further research is needed to determine if the latter assumption, which influences the conclusion proposed, is valid for various classes of microorganisms in various types of waters.

Download Full-text

Analysis of count data in the setting of cervical cancer detection

Journal of Investigative Medicine ◽

10.1136/jim-2020-001381 ◽

2020 ◽

Vol 68 (6) ◽

pp. 1196-1198

Author(s):

Christina G Bracamontes ◽

Thelma Carrillo ◽

Jane Montealegre ◽

Leonid Fradkin ◽

Michele Follen ◽

...

Keyword(s):

Sexual Abuse ◽

Count Data ◽

Pap Smear ◽

Negative Binomial ◽

El Paso ◽

Language Preference ◽

A Value ◽

History Of ◽

Zip Model ◽

Endocervical Canal

Women with an abnormal Pap smear are often referred to colposcopy, a procedure during which endocervical curettage (ECC) may be performed. ECC is a scraping of the endocervical canal lining. Our goal was to compare the performance of a naïve Poisson (NP) regression model with that of a zero-inflated Poisson (ZIP) model when identifying predictors of the number of distress/pain vocalizations made by women undergoing ECC. Data on women seen in the colposcopy clinic at a medical school in El Paso, Texas, were analyzed. The outcome was the number of pain vocalizations made by the patient during ECC. Six dichotomous predictors were evaluated. Initially, NP regression was used to model the data. A high proportion of patients did not make any vocalizations, and hence a ZIP model was also fit and relative rates (RRs) and 95% CIs were calculated. AIC was used to identify the best model (NP or ZIP). Of the 210 women, 154 (73.3%) had a value of 0 for the number of ECC vocalizations. NP identified three statistically significant predictors (language preference of the subject, sexual abuse history and length of the colposcopy), while ZIP identified one: history of sexual abuse (yes vs no; adjusted RR=2.70, 95% CI 1.47 to 4.97). ZIP was preferred over NP. ZIP performed better than NP regression. Clinicians and epidemiologists should consider using the ZIP model (or the zero-inflated negative binomial model) for zero-inflated count data.

Download Full-text

Statistical models for analyzing count data: predictors of length of stay among HIV patients in Portugal using a multilevel model

BMC Health Services Research ◽

10.1186/s12913-021-06389-1 ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Ahmed Nabil Shaaban ◽

Bárbara Peleteiro ◽

Maria Rosario O. Martins

Keyword(s):

Length Of Stay ◽

Regression Model ◽

Random Effects ◽

Count Data ◽

Negative Binomial ◽

Negative Binomial Regression ◽

Comprehensive Approach ◽

Negative Binomial Regression Model ◽

Hiv Patients ◽

Binomial Regression

Abstract Background This study offers a comprehensive approach to precisely analyze the complexly distributed length of stay among HIV admissions in Portugal. Objective To provide an illustration of statistical techniques for analysing count data using longitudinal predictors of length of stay among HIV hospitalizations in Portugal. Method Registered discharges in the Portuguese National Health Service (NHS) facilities Between January 2009 and December 2017, a total of 26,505 classified under Major Diagnostic Category (MDC) created for patients with HIV infection, with HIV/AIDS as a main or secondary cause of admission, were used to predict length of stay among HIV hospitalizations in Portugal. Several strategies were applied to select the best count fit model that includes the Poisson regression model, zero-inflated Poisson, the negative binomial regression model, and zero-inflated negative binomial regression model. A random hospital effects term has been incorporated into the negative binomial model to examine the dependence between observations within the same hospital. A multivariable analysis has been performed to assess the effect of covariates on length of stay. Results The median length of stay in our study was 11 days (interquartile range: 6–22). Statistical comparisons among the count models revealed that the random-effects negative binomial models provided the best fit with observed data. Admissions among males or admissions associated with TB infection, pneumocystis, cytomegalovirus, candidiasis, toxoplasmosis, or mycobacterium disease exhibit a highly significant increase in length of stay. Perfect trends were observed in which a higher number of diagnoses or procedures lead to significantly higher length of stay. The random-effects term included in our model and refers to unexplained factors specific to each hospital revealed obvious differences in quality among the hospitals included in our study. Conclusions This study provides a comprehensive approach to address unique problems associated with the prediction of length of stay among HIV patients in Portugal.

Download Full-text

Transition models for count data: a flexible alternative to fixed distribution models

Statistical Methods & Applications ◽

10.1007/s10260-021-00558-6 ◽

2021 ◽

Author(s):

Moritz Berger ◽

Gerhard Tutz

Keyword(s):

Count Data ◽

Regression Models ◽

Negative Binomial ◽

Real Data ◽

Distribution Models ◽

Explanatory Variables ◽

Excess Zeros ◽

Proposed Model ◽

Transition Models ◽

Fixed Distribution

AbstractA flexible semiparametric class of models is introduced that offers an alternative to classical regression models for count data as the Poisson and Negative Binomial model, as well as to more general models accounting for excess zeros that are also based on fixed distributional assumptions. The model allows that the data itself determine the distribution of the response variable, but, in its basic form, uses a parametric term that specifies the effect of explanatory variables. In addition, an extended version is considered, in which the effects of covariates are specified nonparametrically. The proposed model and traditional models are compared in simulations and by utilizing several real data applications from the area of health and social science.

Download Full-text

Beta-binomial models for meta-analysis with binary outcomes: Variations, extensions, and additional insights from econometrics

Research Methods in Medicine & Health Sciences ◽

10.1177/2632084321996225 ◽

2021 ◽

pp. 263208432199622

Author(s):

Tim Mathes ◽

Oliver Kuss

Keyword(s):

Simulation Study ◽

Count Data ◽

Negative Binomial ◽

Meta Analysis ◽

Negative Binomial Regression ◽

Binary Outcomes ◽

Small Scale ◽

Panel Count Data ◽

Count Data Models ◽

Meta Analyses

Background Meta-analysis of systematically reviewed studies on interventions is the cornerstone of evidence based medicine. In the following, we will introduce the common-beta beta-binomial (BB) model for meta-analysis with binary outcomes and elucidate its equivalence to panel count data models. Methods We present a variation of the standard “common-rho” BB (BBST model) for meta-analysis, namely a “common-beta” BB model. This model has an interesting connection to fixed-effect negative binomial regression models (FE-NegBin) for panel count data. Using this equivalence, it is possible to estimate an extension of the FE-NegBin with an additional multiplicative overdispersion term (RE-NegBin), while preserving a closed form likelihood. An advantage due to the connection to econometric models is, that the models can be easily implemented because “standard” statistical software for panel count data can be used. We illustrate the methods with two real-world example datasets. Furthermore, we show the results of a small-scale simulation study that compares the new models to the BBST. The input parameters of the simulation were informed by actually performed meta-analysis. Results In both example data sets, the NegBin, in particular the RE-NegBin showed a smaller effect and had narrower 95%-confidence intervals. In our simulation study, median bias was negligible for all methods, but the upper quartile for median bias suggested that BBST is most affected by positive bias. Regarding coverage probability, BBST and the RE-NegBin model outperformed the FE-NegBin model. Conclusion For meta-analyses with binary outcomes, the considered common-beta BB models may be valuable extensions to the family of BB models.

Download Full-text