Multiple Imputation in Multilevel Models. A Revision of the Current Software and Usage Examples for Researchers

The Spanish Journal of Psychology ◽

10.1017/sjp.2020.48 ◽

2020 ◽

Vol 23 ◽

Author(s):

Pablo García-Patos ◽

Ricardo Olmos

Keyword(s):

Missing Data ◽

Multiple Imputation ◽

Multilevel Model ◽

Multilevel Models ◽

Technical Literature ◽

Fully Conditional Specification ◽

Estimated Parameters ◽

Promising Strategy ◽

R Packages ◽

Key Questions

Abstract Although modern lines for dealing with missing data are well established from the 1970s, today there is a challenge when researchers encounter this problem in multilevel models. First, there is a variety of existing software to handle missing data based on multiple imputation (MI), currently pointed out by experts as the most promising strategy. Second, the two principal paradigms of MI are joint modelling (JM) and fully conditional specification (FCS), one more complication because they are not equally useful depending on the combination of multilevel model and the estimated parameters affected by missing data. Technical literature do not contribute to ease the number of decisions that researcher has to do. Given these inconveniences, the present paper has three objectives. (1) To present a thorough revision of the most recently developed software and functions about multiple imputation in multilevel models. (2) We derive a set of suggestions, recommendations, and guides for helping researchers to handle missing data. We list a number of key questions to consider when analyzing multilevel models. (3) Finally, based on the previous relevant questions, we present two detailed examples using the recommended R packages to be easy for the researcher applying multiple imputation in multilevel models.

Download Full-text

Multiple Imputation of Multilevel Missing Data

SAGE Open ◽

10.1177/2158244016668220 ◽

2016 ◽

Vol 6 (4) ◽

pp. 215824401666822 ◽

Cited By ~ 17

Author(s):

Simon Grund ◽

Oliver Lüdtke ◽

Alexander Robitzsch

Keyword(s):

Missing Data ◽

Multiple Imputation ◽

Multilevel Models ◽

R Package ◽

Data Sets ◽

Multilevel Data ◽

Statistical Knowledge ◽

Multilevel Research ◽

User Friendly ◽

High Degree

The treatment of missing data can be difficult in multilevel research because state-of-the-art procedures such as multiple imputation (MI) may require advanced statistical knowledge or a high degree of familiarity with certain statistical software. In the missing data literature, pan has been recommended for MI of multilevel data. In this article, we provide an introduction to MI of multilevel missing data using the R package pan, and we discuss its possibilities and limitations in accommodating typical questions in multilevel research. To make pan more accessible to applied researchers, we make use of the mitml package, which provides a user-friendly interface to the pan package and several tools for managing and analyzing multiply imputed data sets. We illustrate the use of pan and mitml with two empirical examples that represent common applications of multilevel models, and we discuss how these procedures may be used in conjunction with other software.

Download Full-text

Multiple Imputation by Fully Conditional Specification for Dealing with Missing Data in a Large Epidemiologic Study

International Journal of Statistics in Medical Research ◽

10.6000/1929-6029.2015.04.03.7 ◽

2015 ◽

Vol 4 (3) ◽

pp. 287-295 ◽

Cited By ~ 105

Author(s):

Yang Liu ◽

◽

Anindya De

Keyword(s):

Missing Data ◽

Multiple Imputation ◽

Epidemiologic Study ◽

Fully Conditional Specification ◽

Conditional Specification

Download Full-text

Evaluating FIML And Multiple Imputation In Joint Ordinal-Continuous Measurements Models With Missing Data

10.31234/osf.io/j3b2t ◽

2021 ◽

Author(s):

Aaron Lim ◽

Mike W.-L. Cheung

Keyword(s):

Missing Data ◽

Least Squares ◽

Multiple Imputation ◽

Latent Variable ◽

Weighted Least Squares ◽

Low Frequencies ◽

Full Information Maximum Likelihood ◽

Fully Conditional Specification ◽

Conditional Specification ◽

Almost All

Missing data is a common occurrence in confirmatory factor analysis (CFA). Much work had evaluated the performance of different techniques when all observed variables were either continuous or ordinal. However, few have investigated these techniques when observed variables are a mix of continuous and ordinal variables. This study investigated the performance of four approaches to handling missing data in these models, a joint ordinal-continuous full information maximum likelihood (JOC-FIML) approach and three multiple imputation approaches (fully conditional specification, fully conditional specification with latent variable formulation, and expectation-maximization with bootstrapping) combined with the weighted least squares with mean and variance adjustment (WLSMV) estimator. In a Monte-Carlo simulation, the JOC-FIML approach produced unbiased estimations of factor loadings and standard errors in almost all conditions. Fully conditional specification combined with WLSMV was second best, producing accurate estimates if the sample size was large. We recommend JOC-FIML across most conditions, except when certain ordinal categories have extremely low frequencies as it was less likely to converge. If the sample is large, fully conditional specification combined with weighted-least-squares is recommended when the FIML approach is not feasible (e.g., non-convergence, variables that predict missingness are not of interest to the analysis).

Download Full-text

Multiple Imputation of Missing Data for Multilevel Models

Organizational Research Methods ◽

10.1177/1094428117703686 ◽

2017 ◽

Vol 21 (1) ◽

pp. 111-149 ◽

Cited By ~ 26

Author(s):

Simon Grund ◽

Oliver Lüdtke ◽

Alexander Robitzsch

Keyword(s):

Missing Data ◽

Multiple Imputation ◽

Multilevel Models

Download Full-text

Multiple Imputation of Missing Data at Level 2: A Comparison of Fully Conditional and Joint Modeling in Multilevel Designs

Journal of Educational and Behavioral Statistics ◽

10.3102/1076998617738087 ◽

2017 ◽

Vol 43 (3) ◽

pp. 316-353 ◽

Cited By ~ 7

Author(s):

Simon Grund ◽

Oliver Lüdtke ◽

Alexander Robitzsch

Keyword(s):

Missing Data ◽

Multiple Imputation ◽

International Student ◽

Student Assessment ◽

Joint Modeling ◽

Computational Procedure ◽

International Student Assessment ◽

Fully Conditional Specification ◽

Using Data ◽

Level 2

Multiple imputation (MI) can be used to address missing data at Level 2 in multilevel research. In this article, we compare joint modeling (JM) and the fully conditional specification (FCS) of MI as well as different strategies for including auxiliary variables at Level 1 using either their manifest or their latent cluster means. We show with theoretical arguments and computer simulations that (a) an FCS approach that uses latent cluster means is comparable to JM and (b) using manifest cluster means provides similar results except in relatively extreme cases with unbalanced data. We outline a computational procedure for including latent cluster means in an FCS approach using plausible values and provide an example using data from the Programme for International Student Assessment 2012 study.

Download Full-text

Multiple Imputation for Multivariate Missing Data: The Fully Conditional Specification Approach

10.1201/9780429156397-7 ◽

2021 ◽

pp. 181-208

Author(s):

Yulei He ◽

Guangyu Zhang ◽

Chiu-Hsieh Hsu

Keyword(s):

Missing Data ◽

Multiple Imputation ◽

Fully Conditional Specification ◽

Conditional Specification

Download Full-text

Multiple Imputation for Missing Data: Fully Conditional Specification Versus Multivariate Normal Imputation

American Journal of Epidemiology ◽

10.1093/aje/kwp425 ◽

2010 ◽

Vol 171 (5) ◽

pp. 624-632 ◽

Cited By ~ 346

Author(s):

K. J. Lee ◽

J. B. Carlin

Keyword(s):

Missing Data ◽

Multiple Imputation ◽

Multivariate Normal ◽

Fully Conditional Specification ◽

Conditional Specification ◽

Multivariate Normal Imputation

Download Full-text

Dealing with missing information on covariates for excess mortality hazard regression models – Making the imputation model compatible with the substantive model

Statistical Methods in Medical Research ◽

10.1177/09622802211031615 ◽

2021 ◽

Vol 30 (10) ◽

pp. 2256-2268

Author(s):

Luís Antunes ◽

Denisa Mendonça ◽

Maria José Bento ◽

Edmund Njeru Njagi ◽

Aurélien Belot ◽

...

Keyword(s):

Missing Data ◽

Multiple Imputation ◽

Survival Data ◽

Regression Models ◽

Cancer Survival ◽

Population Based ◽

Hazard Regression ◽

The North ◽

Fully Conditional Specification ◽

Conditional Specification

Missing data is a common issue in epidemiological databases. Among the different ways of dealing with missing data, multiple imputation has become more available in common statistical software packages. However, the incompatibility between the imputation and substantive model, which can arise when the associations between variables in the substantive model are not taken into account in the imputation models or when the substantive model is itself nonlinear, can lead to invalid inference. Aiming at analysing population-based cancer survival data, we extended the multiple imputation substantive model compatible-fully conditional specification (SMC-FCS) approach, proposed by Bartlett et al. in 2015 to accommodate excess hazard regression models. The proposed approach was compared with the standard fully conditional specification multiple imputation procedure and with the complete-case analysis using a simulation study. The SMC-FCS approach produced unbiased estimates in both scenarios tested, while the fully conditional specification produced biased estimates and poor empirical coverages probabilities. The SMC-FCS algorithm was then used for handling missing data in the evaluation of socioeconomic inequalities in survival from colorectal cancer patients diagnosed in the North Region of Portugal. The analysis using SMC-FCS showed a clearer trend in higher excess hazards for patients coming from more deprived areas. The proposed algorithm was implemented in R software and is presented as Supplementary Material.

Download Full-text

Multiple imputation of missing data in multilevel models with the R package mdmb: a flexible sequential modeling approach

Behavior Research Methods ◽

10.3758/s13428-020-01530-0 ◽

2021 ◽

Author(s):

Simon Grund ◽

Oliver Lüdtke ◽

Alexander Robitzsch

Keyword(s):

Missing Data ◽

Multiple Imputation ◽

Multilevel Models ◽

Main Idea ◽

Nonlinear Effects ◽

R Package ◽

Analysis Model ◽

Explanatory Variables ◽

Modeling Approach ◽

Sequential Modeling

AbstractMultilevel models often include nonlinear effects, such as random slopes or interaction effects. The estimation of these models can be difficult when the underlying variables contain missing data. Although several methods for handling missing data such as multiple imputation (MI) can be used with multilevel data, conventional methods for multilevel MI often do not properly take the nonlinear associations between the variables into account. In the present paper, we propose a sequential modeling approach based on Bayesian estimation techniques that can be used to handle missing data in a variety of multilevel models that involve nonlinear effects. The main idea of this approach is to decompose the joint distribution of the data into several parts that correspond to the outcome and explanatory variables in the intended analysis, thus generating imputations in a manner that is compatible with the substantive analysis model. In three simulation studies, we evaluate the sequential modeling approach and compare it with conventional as well as other substantive-model-compatible approaches to multilevel MI. We implemented the sequential modeling approach in the R package and provide a worked example to illustrate its application.

Download Full-text

Modeling Intraindividual Variability in Three-Level Multilevel Models

Methodology ◽

10.1027/1614-2241/a000150 ◽

2018 ◽

Vol 14 (3) ◽

pp. 95-108 ◽

Cited By ~ 4

Author(s):

Steffen Nestler ◽

Katharina Geukes ◽

Mitja D. Back

Keyword(s):

Multilevel Model ◽

Multilevel Models ◽

Mixed Effects ◽

Repeated Measurements ◽

Scale Model ◽

Suggested Approach ◽

Psychological Study ◽

Level Data ◽

Ecological Momentary ◽

Momentary Assessment

Abstract. The mixed-effects location scale model is an extension of a multilevel model for longitudinal data. It allows covariates to affect both the within-subject variance and the between-subject variance (i.e., the intercept variance) beyond their influence on the means. Typically, the model is applied to two-level data (e.g., the repeated measurements of persons), although researchers are often faced with three-level data (e.g., the repeated measurements of persons within specific situations). Here, we describe an extension of the two-level mixed-effects location scale model to such three-level data. Furthermore, we show how the suggested model can be estimated with Bayesian software, and we present the results of a small simulation study that was conducted to investigate the statistical properties of the suggested approach. Finally, we illustrate the approach by presenting an example from a psychological study that employed ecological momentary assessment.

Download Full-text