On the behaviour of marginal and conditional AIC in linear mixed models

Sonja Greven; Thomas Kneib

doi:10.1093/biomet/asq042

On the behaviour of marginal and conditional AIC in linear mixed models

Biometrika ◽

10.1093/biomet/asq042 ◽

2010 ◽

Vol 97 (4) ◽

pp. 773-789 ◽

Cited By ~ 127

Author(s):

Sonja Greven ◽

Thomas Kneib

Keyword(s):

Random Effects ◽

Mixed Models ◽

Linear Mixed Models ◽

Random Effect ◽

Computational Cost ◽

Information Criterion ◽

R Package ◽

Estimation Uncertainty ◽

Asymptotically Unbiased Estimator ◽

Selection Of

Abstract In linear mixed models, model selection frequently includes the selection of random effects. Two versions of the Akaike information criterion, aic, have been used, based either on the marginal or on the conditional distribution. We show that the marginal aic is not an asymptotically unbiased estimator of the Akaike information, and favours smaller models without random effects. For the conditional aic, we show that ignoring estimation uncertainty in the random effects covariance matrix, as is common practice, induces a bias that can lead to the selection of any random effect not predicted to be exactly zero. We derive an analytic representation of a corrected version of the conditional aic, which avoids the high computational cost and imprecision of available numerical approximations. An implementation in an R package (R Development Core Team, 2010) is provided. All theoretical results are illustrated in simulation studies, and their impact in practice is investigated in an analysis of childhood malnutrition in Zambia.

Download Full-text

Hierarchical Selection of Fixed and Random Effects in Generalized Linear Mixed Models

Statistica Sinica ◽

10.5705/ss.202015.0329 ◽

2017 ◽

Author(s):

Francis Hui ◽

Samuel Muller and ◽

Alan Welsh

Keyword(s):

Random Effects ◽

Mixed Models ◽

Generalized Linear Mixed Models ◽

Linear Mixed Models ◽

Fixed And Random Effects ◽

Hierarchical Selection ◽

Selection Of

Download Full-text

lme4qtl: linear mixed models with flexible covariance structure for genetic studies of related individuals

10.1101/139816 ◽

2017 ◽

Cited By ~ 2

Author(s):

Andrey Ziyatdinov ◽

Miquel Vázquez-Santiago ◽

Helena Brunel ◽

Angel Martinez-Perez ◽

Hugues Aschard ◽

...

Keyword(s):

Qtl Mapping ◽

Random Effects ◽

Mixed Models ◽

Linear Mixed Models ◽

R Package ◽

Covariance Matrices ◽

Linkage Studies ◽

Genetic Studies ◽

Related Individuals ◽

Mapping Models

AbstractBackgroundQuantitative trait locus (QTL) mapping in genetic data often involves analysis of correlated observations, which need to be accounted for to avoid false association signals. This is commonly performed by modeling such correlations as random effects in linear mixed models (LMMs). The R package lme4 is a well-established tool that implements major LMM features using sparse matrix methods; however, it is not fully adapted for QTL mapping association and linkage studies. In particular, two LMM features are lacking in the base version of lme4: the definition of random effects by custom covariance matrices; and parameter constraints, which are essential in advanced QTL models. Apart from applications in linkage studies of related individuals, such functionalities are of high interest for association studies in situations where multiple covariance matrices need to be modeled, a scenario not covered by many genome-wide association study (GWAS) software.ResultsTo address the aforementioned limitations, we developed a new R package lme4qtl as an extension of lme4. First, lme4qtl contributes new models for genetic studies within a single tool integrated with lme4 and its companion packages. Second, lme4qtl offers a flexible framework for scenarios with multiple levels of relatedness and becomes efficient when covariance matrices are sparse. We showed the value of our package using real family-based data in the Genetic Analysis of Idiopathic Thrombophilia 2 (GAIT2) project.ConclusionsOur software lme4qtl enables QTL mapping models with a versatile structure of random effects and efficient computation for sparse covariances. lme4qtl is available at https://github.com/variani/lme4qtl.

Download Full-text

Rates of Glaucoma Progression Derived from Linear Mixed Models Using Varied Random Effect Distributions

10.1101/2021.06.01.21258173 ◽

2021 ◽

Author(s):

Swarup Sai Swaminathan ◽

Samuel I Berchuck ◽

Alessandro A Jammal ◽

J. Sunil Rao ◽

Felipe A Medeiros

Keyword(s):

Visual Field ◽

Random Effects ◽

Mixed Models ◽

Linear Mixed Models ◽

Random Effect ◽

Visual Fields ◽

Gaussian Model ◽

Model Fit ◽

Mean Deviation ◽

Automated Perimetry

Purpose: To compare the ability of linear mixed models with different random effect distributions to estimate rates of visual field loss in glaucoma patients. Design: Retrospective cohort study. Methods: Eyes with ≥5 reliable standard automated perimetry (SAP) tests were identified from the Duke Glaucoma Registry. Mean deviation (MD) values from each visual field and associated timepoints were collected. These data were modeled using ordinary least square (OLS) regression as well as linear mixed models using either the Gaussian, Student t, or log-gamma (LG) distributions as the prior distribution for random effects. Model fit was compared using the Watanabe-Akaike information criterion (WAIC). Simulated eyes of varying initial disease severity and rates of progression were created to assess the accuracy of each Bayesian model in predicting the rate of change and likelihood of declaring progression. Results: A total of 52,900 visual fields from 6,558 eyes of 3,981 subjects were included. Mean follow-up period was 8.7±4.0 years, with an average of 8.1±3.7 visual fields per eye. The LG model produced the lowest WAIC, demonstrating optimal model fit. Compared to the Gaussian model, the LG model classified almost twice as many eyes as fast progressors. In simulations, the LG model declared progression earlier than OLS (P<0.001) and had the greatest accuracy in predicted slopes (P<0.001). In contrast, the Gaussian model significantly underestimated rates of progression among fast and catastrophic progressors. Conclusions: Linear mixed models using the LG distribution to model random effects outperformed conventional approaches for estimating rates of SAP MD loss in a population with glaucoma.

Download Full-text

Gradient boosting for linear mixed models

The International Journal of Biostatistics ◽

10.1515/ijb-2020-0136 ◽

2021 ◽

Vol 0 (0) ◽

Author(s):

Colin Griesbach ◽

Benjamin Säfken ◽

Elisabeth Waldmann

Keyword(s):

Random Effects ◽

Mixed Models ◽

Selection Procedure ◽

Classification Theory ◽

Gradient Boosting ◽

Random Structure ◽

Boosting Algorithm ◽

The One ◽

Biased Estimates ◽

Selection Of

Abstract Gradient boosting from the field of statistical learning is widely known as a powerful framework for estimation and selection of predictor effects in various regression models by adapting concepts from classification theory. Current boosting approaches also offer methods accounting for random effects and thus enable prediction of mixed models for longitudinal and clustered data. However, these approaches include several flaws resulting in unbalanced effect selection with falsely induced shrinkage and a low convergence rate on the one hand and biased estimates of the random effects on the other hand. We therefore propose a new boosting algorithm which explicitly accounts for the random structure by excluding it from the selection procedure, properly correcting the random effects estimates and in addition providing likelihood-based estimation of the random effects variance structure. The new algorithm offers an organic and unbiased fitting approach, which is shown via simulations and data examples.

Download Full-text

The Impact of Misspecified Random Effect Distribution in a Weibull Regression Mixed Model

Stats ◽

10.3390/stats1010005 ◽

2018 ◽

Vol 1 (1) ◽

pp. 48-76

Author(s):

Freddy Hernández ◽

Viviana Giampaoli

Keyword(s):

Weibull Distribution ◽

Random Effects ◽

Mixed Models ◽

Fixed Effects ◽

Mixed Model ◽

Random Effect ◽

Estimation Procedure ◽

Weibull Regression ◽

Two Parameters ◽

The Impact

Mixed models are useful tools for analyzing clustered and longitudinal data. These models assume that random effects are normally distributed. However, this may be unrealistic or restrictive when representing information of the data. Several papers have been published to quantify the impacts of misspecification of the shape of the random effects in mixed models. Notably, these studies primarily concentrated their efforts on models with response variables that have normal, logistic and Poisson distributions, and the results were not conclusive. As such, we investigated the misspecification of the shape of the random effects in a Weibull regression mixed model with random intercepts in the two parameters of the Weibull distribution. Through an extensive simulation study considering six random effect distributions and assuming normality for the random effects in the estimation procedure, we found an impact of misspecification on the estimations of the fixed effects associated with the second parameter σ of the Weibull distribution. Additionally, the variance components of the model were also affected by the misspecification.

Download Full-text

Restricted maximum likelihood and inference of random effects in linear mixed models

Methods and Applications of Longitudinal Data Analysis ◽

10.1016/b978-0-12-801342-7.00004-6 ◽

2016 ◽

pp. 95-131

Author(s):

Xian Liu

Keyword(s):

Maximum Likelihood ◽

Random Effects ◽

Mixed Models ◽

Linear Mixed Models ◽

Restricted Maximum Likelihood

Download Full-text

Inferences in linear mixed models with skew-normal random effects

Acta Mathematica Sinica English Series ◽

10.1007/s10114-015-3326-5 ◽

2015 ◽

Vol 31 (4) ◽

pp. 576-594 ◽

Cited By ~ 13

Author(s):

Ren Dao Ye ◽

Tong Hui Wang

Keyword(s):

Random Effects ◽

Mixed Models ◽

Linear Mixed Models ◽

Skew Normal

Download Full-text

Flexible modelling of random effects in linear mixed models—A Bayesian approach

Computational Statistics & Data Analysis ◽

10.1016/j.csda.2007.09.005 ◽

2008 ◽

Vol 52 (3) ◽

pp. 1347-1361 ◽

Cited By ~ 9

Author(s):

Remus K.W. Ho ◽

Inchi Hu

Keyword(s):

Random Effects ◽

Bayesian Approach ◽

Mixed Models ◽

Linear Mixed Models

Download Full-text

Basic Features of the Analysis of Germination Data with Generalized Linear Mixed Models

Data ◽

10.3390/data5010006 ◽

2020 ◽

Vol 5 (1) ◽

pp. 6 ◽

Cited By ~ 2

Author(s):

Alberto Gianinetti

Keyword(s):

Boundary Conditions ◽

Longitudinal Studies ◽

Random Effects ◽

Mixed Models ◽

Generalized Linear Mixed Models ◽

Linear Mixed Models ◽

Covariance Structure ◽

Conditional Models ◽

Error Terms ◽

Germination Indices

Germination data are discrete and binomial. Although analysis of variance (ANOVA) has long been used for the statistical analysis of these data, generalized linear mixed models (GzLMMs) provide a more consistent theoretical framework. GzLMMs are suitable for final germination percentages (FGP) as well as longitudinal studies of germination time-courses. Germination indices (i.e., single-value parameters summarizing the results of a germination assay by combining the level and rapidity of germination) and other data with a Gaussian error distribution can be analyzed too. There are, however, different kinds of GzLMMs: Conditional (i.e., random effects are modeled as deviations from the general intercept with a specific covariance structure), marginal (i.e., random effects are modeled solely as a variance/covariance structure of the error terms), and quasi-marginal (some random effects are modeled as deviations from the intercept and some are modeled as a covariance structure of the error terms) models can be applied to the same data. It is shown that: (a) For germination data, conditional, marginal, and quasi-marginal GzLMMs tend to converge to a similar inference; (b) conditional models are the first choice for FGP; (c) marginal or quasi-marginal models are more suited for longitudinal studies, although conditional models lead to a congruent inference; (d) in general, common random factors are better dealt with as random intercepts, whereas serial correlation is easier to model in terms of the covariance structure of the error terms; (e) germination indices are not binomial and can be easier to analyze with a marginal model; (f) in boundary conditions (when some means approach 0% or 100%), conditional models with an integral approximation of true likelihood are more appropriate; in non-boundary conditions, (g) germination data can be fitted with default pseudo-likelihood estimation techniques, on the basis of the SAS-based code templates provided here; (h) GzLMMs are remarkably good for the analysis of germination data except if some means are 0% or 100%. In this case, alternative statistical approaches may be used, such as survival analysis or linear mixed models (LMMs) with transformed data, unless an ad hoc data adjustment in estimates of limit means is considered, either experimentally or computationally. This review is intended as a basic tutorial for the application of GzLMMs, and is, therefore, of interest primarily to researchers in the agricultural sciences.

Download Full-text

Random Effects Misspecification Can Have Severe Consequences for Random Effects Inference in Linear Mixed Models

International Statistical Review ◽

10.1111/insr.12378 ◽

2020 ◽

Cited By ~ 1

Author(s):

Francis K. C. Hui ◽

Samuel Müller ◽

Alan H. Welsh

Keyword(s):

Random Effects ◽

Mixed Models ◽

Linear Mixed Models

Download Full-text