On the use of multiple imputation in handling missing values in longitudinal studies

Multiple imputation methods for handling missing values in longitudinal studies with sampling weights: Comparison of methods implemented in Stata

Biometrical Journal ◽

10.1002/bimj.201900360 ◽

2020 ◽

Author(s):

Anurika P. De Silva ◽

Alysha M. De Livera ◽

Katherine J. Lee ◽

Margarita Moreno‐Betancur ◽

Julie A. Simpson

Keyword(s):

Multiple Imputation ◽

Longitudinal Studies ◽

Missing Values ◽

Comparison Of Methods ◽

Sampling Weights ◽

Imputation Methods

Download Full-text

Strategies for Multiple Imputation in Longitudinal Studies

American Journal of Epidemiology ◽

10.1093/aje/kwq137 ◽

2010 ◽

Vol 172 (4) ◽

pp. 478-487 ◽

Cited By ~ 200

Author(s):

M. Spratt ◽

J. Carpenter ◽

J. A. C. Sterne ◽

J. B. Carlin ◽

J. Heron ◽

...

Keyword(s):

Multiple Imputation ◽

Longitudinal Studies

Download Full-text

Comparison of Methods for Handling Covariate Missingness in Propensity Score Estimation with a Binary Exposure

10.21203/rs.2.18726/v1 ◽

2019 ◽

Author(s):

Donna Coffman ◽

Jiangxiu Zhou ◽

Xizhen Cai

Keyword(s):

Propensity Score ◽

Multiple Imputation ◽

Missing Values ◽

Propensity Scores ◽

Causal Effect ◽

Nonparametric Approach ◽

Split Method ◽

Mean Imputation ◽

Substantial Bias ◽

Effect Estimation

Abstract Background Causal effect estimation with observational data is subject to bias due to confounding, which is often controlled for using propensity scores. One unresolved issue in propensity score estimation is how to handle missing values in covariates.Method Several approaches have been proposed for handling covariate missingness, including multiple imputation (MI), multiple imputation with missingness pattern (MIMP), and treatment mean imputation. However, there are other potentially useful approaches that have not been evaluated, including single imputation (SI) + prediction error (PE), SI+PE + parameter uncertainty (PU), and Generalized Boosted Modeling (GBM), which is a nonparametric approach for estimating propensity scores in which missing values are automatically handled in the estimation using a surrogate split method. To evaluate the performance of these approaches, a simulation study was conducted.Results Results suggested that SI+PE, SI+PE+PU, MI, and MIMP perform almost equally well and better than treatment mean imputation and GBM in terms of bias; however, MI and MIMP account for the additional uncertainty of imputing the missingness.Conclusions Applying GBM to the incomplete data and relying on the surrogate split approach resulted in substantial bias. Imputation prior to implementing GBM is recommended.

Download Full-text

Improving Outcome Predictions for Patients Receiving Mechanical Circulatory Support by Optimizing Imputation of Missing Values

Circulation Cardiovascular Quality and Outcomes ◽

10.1161/circoutcomes.120.007071 ◽

2021 ◽

Author(s):

Byron C. Jaeger ◽

Ryan Cantor ◽

Venkata Sthanam ◽

Rongbing Xie ◽

James K. Kirklin ◽

...

Keyword(s):

Multiple Imputation ◽

Risk Prediction ◽

Random Forests ◽

Missing Values ◽

Prediction Models ◽

Model Performance ◽

Circulatory Support ◽

Risk Prediction Models ◽

Prognostic Accuracy ◽

The Mean

Background: Risk prediction models play an important role in clinical decision making. When developing risk prediction models, practitioners often impute missing values to the mean. We evaluated the impact of applying other strategies to impute missing values on the prognostic accuracy of downstream risk prediction models, that is, models fitted to the imputed data. A secondary objective was to compare the accuracy of imputation methods based on artificially induced missing values. To complete these objectives, we used data from the Interagency Registry for Mechanically Assisted Circulatory Support. Methods: We applied 12 imputation strategies in combination with 2 different modeling strategies for mortality and transplant risk prediction following surgery to receive mechanical circulatory support. Model performance was evaluated using Monte-Carlo cross-validation and measured based on outcomes 6 months following surgery using the scaled Brier score, concordance index, and calibration error. We used Bayesian hierarchical models to compare model performance. Results: Multiple imputation with random forests emerged as a robust strategy to impute missing values, increasing model concordance by 0.0030 (25th–75th percentile: 0.0008–0.0052) compared with imputation to the mean for mortality risk prediction using a downstream proportional hazards model. The posterior probability that single and multiple imputation using random forests would improve concordance versus mean imputation was 0.464 and >0.999, respectively. Conclusions: Selecting an optimal strategy to impute missing values such as random forests and applying multiple imputation can improve the prognostic accuracy of downstream risk prediction models.

Download Full-text

Enhancing the Human Health Status Prediction: the ATHLOS Project

10.1101/2021.01.19.21250076 ◽

2021 ◽

Author(s):

Panagiotis Anagnostou ◽

Sotiris Tasoulis ◽

Aristidis G. Vrahatis ◽

Spiros Georgakopoulos ◽

Matthew Prina ◽

...

Keyword(s):

Longitudinal Studies ◽

Regression Models ◽

Missing Values ◽

Data Imputation ◽

High Complexity ◽

Preventive Healthcare ◽

Horizon 2020 ◽

Research And Innovation ◽

The Impact ◽

Learning Data

AbstractPreventive healthcare is a crucial pillar of health as it contributes to staying healthy and having immediate treatment when needed. Mining knowledge from longitudinal studies has the potential to significantly contribute to the improvement of preventive healthcare. Unfortunately, data originated from such studies are characterized by high complexity, huge volume and a plethora of missing values. Machine Learning, Data Mining and Data Imputation models are utilized as part of solving the aforementioned challenges, respectively. Towards this direction, we focus on the development of a complete methodology for the ATHLOS (Ageing Trajectories of Health: Longitudinal Opportunities and Synergies) Project - funded by the European Union’s Horizon 2020 Research and Innovation Program, which aims to achieve a better interpretation of the impact of aging on health. The inherent complexity of the provided dataset lie in the fact that the project includes 15 independent European and international longitudinal studies of aging. In this work, we particularly focus on the HealthStatus (HS) score, an index that estimates the human status of health, aiming to examine the effect of various data imputation models to the prediction power of classification and regression models. Our results are promising, indicating the critical importance of data imputation in enhancing preventive medicine’s crucial role.

Download Full-text

Copy Mean: A New Method to Impute Intermittent Missing Values in Longitudinal Studies

Open Journal of Statistics ◽

10.4236/ojs.2013.34a004 ◽

2013 ◽

Vol 03 (04) ◽

pp. 26-40 ◽

Cited By ~ 10

Author(s):

Christophe Genolini ◽

René Écochard ◽

Hélène Jacqmin-Gadda

Keyword(s):

Longitudinal Studies ◽

Missing Values ◽

New Method

Download Full-text

Comparison of Missing Data Infilling Mechanisms for Recovering a Real-World Single Station Streamflow Observation

International Journal of Environmental Research and Public Health ◽

10.3390/ijerph18168375 ◽

2021 ◽

Vol 18 (16) ◽

pp. 8375

Author(s):

Thelma Dede Baddoo ◽

Zhijia Li ◽

Samuel Nii Odai ◽

Kenneth Rodolphe Chabi Boni ◽

Isaac Kwesi Nooni ◽

...

Keyword(s):

Missing Data ◽

Multiple Imputation ◽

Real World ◽

Missing Values ◽

Total Error ◽

Extensive Study ◽

Error Measurement ◽

Missing Data Imputation ◽

Single Station ◽

Real World Datasets

Reconstructing missing streamflow data can be challenging when additional data are not available, and missing data imputation of real-world datasets to investigate how to ascertain the accuracy of imputation algorithms for these datasets are lacking. This study investigated the necessary complexity of missing data reconstruction schemes to obtain the relevant results for a real-world single station streamflow observation to facilitate its further use. This investigation was implemented by applying different missing data mechanisms spanning from univariate algorithms to multiple imputation methods accustomed to multivariate data taking time as an explicit variable. The performance accuracy of these schemes was assessed using the total error measurement (TEM) and a recommended localized error measurement (LEM) in this study. The results show that univariate missing value algorithms, which are specially developed to handle univariate time series, provide satisfactory results, but the ones which provide the best results are usually time and computationally intensive. Also, multiple imputation algorithms which consider the surrounding observed values and/or which can understand the characteristics of the data provide similar results to the univariate missing data algorithms and, in some cases, perform better without the added time and computational downsides when time is taken as an explicit variable. Furthermore, the LEM would be especially useful when the missing data are in specific portions of the dataset or where very large gaps of ‘missingness’ occur. Finally, proper handling of missing values of real-world hydroclimatic datasets depends on imputing and extensive study of the particular dataset to be imputed.

Download Full-text

Missing data in longitudinal studies: Comparison of multiple imputation methods in a real clinical setting

Journal of Evaluation in Clinical Practice ◽

10.1111/jep.13376 ◽

2020 ◽

Author(s):

Rosalba Rosato ◽

Eva Pagano ◽

Silvia Testa ◽

Paolo Zola ◽

Daniela di Cuonzo

Keyword(s):

Missing Data ◽

Multiple Imputation ◽

Longitudinal Studies ◽

Clinical Setting ◽

Imputation Methods

Download Full-text

Multiple Imputation of Missing Values: Further Update of Ice, with an Emphasis on Interval Censoring

The Stata Journal Promoting communications on statistics and Stata ◽

10.1177/1536867x0800700401 ◽

2007 ◽

Vol 7 (4) ◽

pp. 445-464 ◽

Cited By ~ 148

Author(s):

Patrick Royston

Keyword(s):

Multiple Imputation ◽

Missing Values ◽

Interval Censoring

Download Full-text

The efficiency of multiple imputation and maximum likelihood methods for estimating missing values

Indian Journal of Science and Technology ◽

10.17485/ijst/2018/v11i16/118701 ◽

2018 ◽

Vol 11 (16) ◽

pp. 1-11

Author(s):

Tlhalitshi Volition Montshiwa ◽

Ntebo Moroke ◽

Elias Munapo ◽

◽

...

Keyword(s):

Maximum Likelihood ◽

Multiple Imputation ◽

Missing Values ◽

Likelihood Methods ◽

Maximum Likelihood Methods

Download Full-text