A Flexible Mixed Model for Clustered Count Data

Darcy Steeg Morris; Kimberly F. Sellers

doi:10.3390/stats5010004

A Flexible Mixed Model for Clustered Count Data

Stats ◽

10.3390/stats5010004 ◽

2022 ◽

Vol 5 (1) ◽

pp. 52-69

Author(s):

Darcy Steeg Morris ◽

Kimberly F. Sellers

Keyword(s):

Regression Model ◽

Random Effects ◽

Count Data ◽

Mixed Models ◽

Poisson Regression ◽

Mixed Model ◽

Random Effect ◽

Model Fit ◽

Cross Sectional ◽

Special Cases

Clustered count data are commonly modeled using Poisson regression with random effects to account for the correlation induced by clustering. The Poisson mixed model allows for overdispersion via the nature of the within-cluster correlation, however, departures from equi-dispersion may also exist due to the underlying count process mechanism. We study the cross-sectional COM-Poisson regression model—a generalized regression model for count data in light of data dispersion—together with random effects for analysis of clustered count data. We demonstrate model flexibility of the COM-Poisson random intercept model, including choice of the random effect distribution, via simulated and real data examples. We find that COM-Poisson mixed models provide comparable model fit to well-known mixed models for associated special cases of clustered discrete data, and result in improved model fit for data with intermediate levels of over- or underdispersion in the count mechanism. Accordingly, the proposed models are useful for capturing dispersion not consistent with commonly used statistical models, and also serve as a practical diagnostic tool.

Download Full-text

The Impact of Misspecified Random Effect Distribution in a Weibull Regression Mixed Model

Stats ◽

10.3390/stats1010005 ◽

2018 ◽

Vol 1 (1) ◽

pp. 48-76

Author(s):

Freddy Hernández ◽

Viviana Giampaoli

Keyword(s):

Weibull Distribution ◽

Random Effects ◽

Mixed Models ◽

Fixed Effects ◽

Mixed Model ◽

Random Effect ◽

Estimation Procedure ◽

Weibull Regression ◽

Two Parameters ◽

The Impact

Mixed models are useful tools for analyzing clustered and longitudinal data. These models assume that random effects are normally distributed. However, this may be unrealistic or restrictive when representing information of the data. Several papers have been published to quantify the impacts of misspecification of the shape of the random effects in mixed models. Notably, these studies primarily concentrated their efforts on models with response variables that have normal, logistic and Poisson distributions, and the results were not conclusive. As such, we investigated the misspecification of the shape of the random effects in a Weibull regression mixed model with random intercepts in the two parameters of the Weibull distribution. Through an extensive simulation study considering six random effect distributions and assuming normality for the random effects in the estimation procedure, we found an impact of misspecification on the estimations of the fixed effects associated with the second parameter σ of the Weibull distribution. Additionally, the variance components of the model were also affected by the misspecification.

Download Full-text

Rates of Glaucoma Progression Derived from Linear Mixed Models Using Varied Random Effect Distributions

10.1101/2021.06.01.21258173 ◽

2021 ◽

Author(s):

Swarup Sai Swaminathan ◽

Samuel I Berchuck ◽

Alessandro A Jammal ◽

J. Sunil Rao ◽

Felipe A Medeiros

Keyword(s):

Visual Field ◽

Random Effects ◽

Mixed Models ◽

Linear Mixed Models ◽

Random Effect ◽

Visual Fields ◽

Gaussian Model ◽

Model Fit ◽

Mean Deviation ◽

Automated Perimetry

Purpose: To compare the ability of linear mixed models with different random effect distributions to estimate rates of visual field loss in glaucoma patients. Design: Retrospective cohort study. Methods: Eyes with ≥5 reliable standard automated perimetry (SAP) tests were identified from the Duke Glaucoma Registry. Mean deviation (MD) values from each visual field and associated timepoints were collected. These data were modeled using ordinary least square (OLS) regression as well as linear mixed models using either the Gaussian, Student t, or log-gamma (LG) distributions as the prior distribution for random effects. Model fit was compared using the Watanabe-Akaike information criterion (WAIC). Simulated eyes of varying initial disease severity and rates of progression were created to assess the accuracy of each Bayesian model in predicting the rate of change and likelihood of declaring progression. Results: A total of 52,900 visual fields from 6,558 eyes of 3,981 subjects were included. Mean follow-up period was 8.7±4.0 years, with an average of 8.1±3.7 visual fields per eye. The LG model produced the lowest WAIC, demonstrating optimal model fit. Compared to the Gaussian model, the LG model classified almost twice as many eyes as fast progressors. In simulations, the LG model declared progression earlier than OLS (P<0.001) and had the greatest accuracy in predicted slopes (P<0.001). In contrast, the Gaussian model significantly underestimated rates of progression among fast and catastrophic progressors. Conclusions: Linear mixed models using the LG distribution to model random effects outperformed conventional approaches for estimating rates of SAP MD loss in a population with glaucoma.

Download Full-text

The generalizability crisis

Behavioral and Brain Sciences ◽

10.1017/s0140525x20001685 ◽

2020 ◽

pp. 1-37

Author(s):

Tal Yarkoni

Keyword(s):

Quantitative Analysis ◽

Statistical Inference ◽

Random Effects ◽

False Positive ◽

Mixed Model ◽

Linear Mixed Model ◽

Random Effect ◽

Statistical Procedures ◽

Subject Variability ◽

Replication Crisis

Abstract Most theories and hypotheses in psychology are verbal in nature, yet their evaluation overwhelmingly relies on inferential statistical procedures. The validity of the move from qualitative to quantitative analysis depends on the verbal and statistical expressions of a hypothesis being closely aligned—that is, that the two must refer to roughly the same set of hypothetical observations. Here I argue that many applications of statistical inference in psychology fail to meet this basic condition. Focusing on the most widely used class of model in psychology—the linear mixed model—I explore the consequences of failing to statistically operationalize verbal hypotheses in a way that respects researchers' actual generalization intentions. I demonstrate that whereas the "random effect" formalism is used pervasively in psychology to model inter-subject variability, few researchers accord the same treatment to other variables they clearly intend to generalize over (e.g., stimuli, tasks, or research sites). The under-specification of random effects imposes far stronger constraints on the generalizability of results than most researchers appreciate. Ignoring these constraints can dramatically inflate false positive rates, and often leads researchers to draw sweeping verbal generalizations that lack a meaningful connection to the statistical quantities they are putatively based on. I argue that failure to take the alignment between verbal and statistical expressions seriously lies at the heart of many of psychology's ongoing problems (e.g., the replication crisis), and conclude with a discussion of several potential avenues for improvement.

Download Full-text

Statistical models for analyzing count data: predictors of length of stay among HIV patients in Portugal using a multilevel model

BMC Health Services Research ◽

10.1186/s12913-021-06389-1 ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Ahmed Nabil Shaaban ◽

Bárbara Peleteiro ◽

Maria Rosario O. Martins

Keyword(s):

Length Of Stay ◽

Regression Model ◽

Random Effects ◽

Count Data ◽

Negative Binomial ◽

Negative Binomial Regression ◽

Comprehensive Approach ◽

Negative Binomial Regression Model ◽

Hiv Patients ◽

Binomial Regression

Abstract Background This study offers a comprehensive approach to precisely analyze the complexly distributed length of stay among HIV admissions in Portugal. Objective To provide an illustration of statistical techniques for analysing count data using longitudinal predictors of length of stay among HIV hospitalizations in Portugal. Method Registered discharges in the Portuguese National Health Service (NHS) facilities Between January 2009 and December 2017, a total of 26,505 classified under Major Diagnostic Category (MDC) created for patients with HIV infection, with HIV/AIDS as a main or secondary cause of admission, were used to predict length of stay among HIV hospitalizations in Portugal. Several strategies were applied to select the best count fit model that includes the Poisson regression model, zero-inflated Poisson, the negative binomial regression model, and zero-inflated negative binomial regression model. A random hospital effects term has been incorporated into the negative binomial model to examine the dependence between observations within the same hospital. A multivariable analysis has been performed to assess the effect of covariates on length of stay. Results The median length of stay in our study was 11 days (interquartile range: 6–22). Statistical comparisons among the count models revealed that the random-effects negative binomial models provided the best fit with observed data. Admissions among males or admissions associated with TB infection, pneumocystis, cytomegalovirus, candidiasis, toxoplasmosis, or mycobacterium disease exhibit a highly significant increase in length of stay. Perfect trends were observed in which a higher number of diagnoses or procedures lead to significantly higher length of stay. The random-effects term included in our model and refers to unexplained factors specific to each hospital revealed obvious differences in quality among the hospitals included in our study. Conclusions This study provides a comprehensive approach to address unique problems associated with the prediction of length of stay among HIV patients in Portugal.

Download Full-text

A multivariate Poisson regression model for count data

Journal of Applied Statistics ◽

10.1080/02664763.2021.1877637 ◽

2021 ◽

pp. 1-17

Author(s):

J. M. Muñoz-Pichardo ◽

R. Pino-Mejías ◽

J. García-Heras ◽

F. Ruiz-Muñoz ◽

M. Luz González-Regalado

Keyword(s):

Regression Model ◽

Count Data ◽

Poisson Regression ◽

Poisson Regression Model

Download Full-text

Response transformations for random effect and variance component models

Statistical Modelling ◽

10.1177/1471082x20966919 ◽

2020 ◽

pp. 1471082X2096691

Author(s):

Amani Almohaimeed ◽

Jochen Einbeck

Keyword(s):

Maximum Likelihood ◽

Random Effects ◽

Mixed Model ◽

Linear Mixed Model ◽

Random Effect ◽

Statistical Technique ◽

Response Distribution ◽

Level Data ◽

Variance Component Models ◽

Response Transformation

Random effect models have been popularly used as a mainstream statistical technique over several decades; and the same can be said for response transformation models such as the Box–Cox transformation. The latter aims at ensuring that the assumptions of normality and of homoscedasticity of the response distribution are fulfilled, which are essential conditions for inference based on a linear model or a linear mixed model. However, methodology for response transformation and simultaneous inclusion of random effects has been developed and implemented only scarcely, and is so far restricted to Gaussian random effects. We develop such methodology, thereby not requiring parametric assumptions on the distribution of the random effects. This is achieved by extending the ‘Nonparametric Maximum Likelihood’ towards a ‘Nonparametric profile maximum likelihood’ technique, allowing to deal with overdispersion as well as two-level data scenarios.

Download Full-text

PARAMETER ESTIMATION ON HURDLE POISSON REGRESSION MODEL WITH CENSORED DATA

Jurnal Teknologi ◽

10.11113/jt.v57.1533 ◽

2012 ◽

Vol 57 (1) ◽

Author(s):

SEYED EHSAN SAFFAR ◽

ROBIAH ADNAN ◽

WILLIAM GREENE

Keyword(s):

Regression Model ◽

Count Data ◽

Poisson Regression ◽

Goodness Of Fit ◽

Poisson Model ◽

Likelihood Method ◽

Poisson Regression Model ◽

Response Variable ◽

The Mean ◽

Over Dispersion

A Poisson model typically is assumed for count data. In many cases, there are many zeros in the dependent variable and because of these many zeros, the mean and the variance values of the dependent variable are not the same as before. In fact, the variance value of the dependent variable will be much more than the mean value of the dependent variable and this is called over–dispersion. Therefore, Poisson model is not suitable anymore for this kind of data because of too many zeros. Thus, it is suggested to use a hurdle Poisson regression model to overcome over–dispersion problem. Furthermore, the response variable in such cases is censored for some values. In this paper, a censored hurdle Poisson regression model is introduced on count data with many zeros. In this model, we consider a response variable and one or more than one explanatory variables. The estimation of regression parameters using the maximum likelihood method is discussed and the goodness–of–fit for the regression model is examined. We study the effects of right censoring on estimated parameters and their standard errors via an example.

Download Full-text

Mixed Models as a Tool for Comparing Groups of Time Series in Plant Sciences

Plants ◽

10.3390/plants10020362 ◽

2021 ◽

Vol 10 (2) ◽

pp. 362

Author(s):

Ioannis Spyroglou ◽

Jan Skalák ◽

Veronika Balakhonova ◽

Zuzana Benedikty ◽

Alexandros G. Rigas ◽

...

Keyword(s):

Time Series ◽

Random Effects ◽

Mixed Models ◽

Mixed Model ◽

Repeated Measurements ◽

Time Series Models ◽

Additional Time ◽

Biotic Stresses ◽

Viable Solution ◽

Long Time

Plants adapt to continual changes in environmental conditions throughout their life spans. High-throughput phenotyping methods have been developed to noninvasively monitor the physiological responses to abiotic/biotic stresses on a scale spanning a long time, covering most of the vegetative and reproductive stages. However, some of the physiological events comprise almost immediate and very fast responses towards the changing environment which might be overlooked in long-term observations. Additionally, there are certain technical difficulties and restrictions in analyzing phenotyping data, especially when dealing with repeated measurements. In this study, a method for comparing means at different time points using generalized linear mixed models combined with classical time series models is presented. As an example, we use multiple chlorophyll time series measurements from different genotypes. The use of additional time series models as random effects is essential as the residuals of the initial mixed model may contain autocorrelations that bias the result. The nature of mixed models offers a viable solution as these can incorporate time series models for residuals as random effects. The results from analyzing chlorophyll content time series show that the autocorrelation is successfully eliminated from the residuals and incorporated into the final model. This allows the use of statistical inference.

Download Full-text

Bayesian spatial and spatio-temporal approaches to modelling dengue fever: a systematic review

Epidemiology and Infection ◽

10.1017/s0950268818002807 ◽

2018 ◽

Vol 147 ◽

Cited By ~ 8

Author(s):

A. Aswi ◽

S. M. Cramb ◽

P. Moraga ◽

K. Mengersen

Keyword(s):

Dengue Fever ◽

Random Effects ◽

Mixed Model ◽

Linear Mixed Model ◽

Assessment Tool ◽

Temporal Dynamics ◽

Random Effect ◽

Temperature And Precipitation ◽

Generalised Linear Mixed Model ◽

Spatio Temporal

AbstractDengue fever (DF) is one of the world's most disabling mosquito-borne diseases, with a variety of approaches available to model its spatial and temporal dynamics. This paper aims to identify and compare the different spatial and spatio-temporal Bayesian modelling methods that have been applied to DF and examine influential covariates that have been reportedly associated with the risk of DF. A systematic search was performed in December 2017, using Web of Science, Scopus, ScienceDirect, PubMed, ProQuest and Medline (via Ebscohost) electronic databases. The search was restricted to refereed journal articles published in English from January 2000 to November 2017. Thirty-one articles met the inclusion criteria. Using a modified quality assessment tool, the median quality score across studies was 14/16. The most popular Bayesian statistical approach to dengue modelling was a generalised linear mixed model with spatial random effects described by a conditional autoregressive prior. A limited number of studies included spatio-temporal random effects. Temperature and precipitation were shown to often influence the risk of dengue. Developing spatio-temporal random-effect models, considering other priors, using a dataset that covers an extended time period, and investigating other covariates would help to better understand and control DF transmission.

Download Full-text

PEMODELAN DENGAN GEOGRAPHICALLY WEIGHTED NEGATIVE BINOMIAL REGRESSION (Studi kasus: Banyaknya Penderita Kusta di Jawa Barat)

Xplore Journal of Statistics ◽

10.29244/xplore.v10i3.833 ◽

2021 ◽

Vol 10 (3) ◽

pp. 226-236

Author(s):

Khusnul Khotimah ◽

Itasia Dina Sulvianti ◽

Pika Silvianti

Keyword(s):

Regression Model ◽

Count Data ◽

Poisson Regression ◽

Negative Binomial ◽

Negative Binomial Regression ◽

Kernel Weight ◽

Negative Binomial Regression Model ◽

West Java ◽

Binomial Regression ◽

Spatial Heterogenity

The number of leper in West Java is an example of the count data case. The analyzes commonly used in count data is Poisson regression. This research will determine the variables that influence the number of leper in West Java. The data used is the number of leper in West Java in 2019. This data has an overdispersion condition and spatial heterogenity. To handle overdispersion, the negative binomial regression model can be employed. While spatial heterogenity is overcome by adding adaptive bisquare kernel weight. This research resulted Geographically Weighted Negative Binomial Regression (GWNBR) with a weighting adaptive bisquare kernel classifies regency/city in West Java into ten groups based on the variables that sigfinicantly influence the number of leper. In general, the variable in the percentage of households with Clean and Healthy Behavior (PHBS) has a significant effect in all regency/city in West Java. Especially for Bogor Regency, Depok City, Bogor City, and Pangandaran Regency, the variable of the percentage of people poverty does not have a significant effect on the number leper.

Download Full-text