The performance of multiple imputation for missing covariate data within the context of regression relative survival analysis

2008 ◽  
Vol 27 (30) ◽  
pp. 6310-6331 ◽  
Author(s):  
Roch Giorgi ◽  
Aurélien Belot ◽  
Jean Gaudart ◽  
Guy Launoy
2010 ◽  
Vol 29 (29) ◽  
pp. 3004-3016 ◽  
Author(s):  
Meredith L. Wallace ◽  
Stewart J. Anderson ◽  
Sati Mazumdar

2013 ◽  
Vol 2013 ◽  
pp. 1-6 ◽  
Author(s):  
Christopher Vinnard ◽  
E. Paul Wileyto ◽  
Gregory P. Bisson ◽  
Carla A. Winston

Aims. The purpose of this study was to compare methods for handling missing data in analysis of the National Tuberculosis Surveillance System of the Centers for Disease Control and Prevention. Because of the high rate of missing human immunodeficiency virus (HIV) infection status in this dataset, we used multiple imputation methods to minimize the bias that may result from less sophisticated methods. Methods. We compared analysis based on multiple imputation methods with analysis based on deleting subjects with missing covariate data from regression analysis (case exclusion), and determined whether the use of increasing numbers of imputed datasets would lead to changes in the estimated association between isoniazid resistance and death. Results. Following multiple imputation, the odds ratio for initial isoniazid resistance and death was 2.07 (95% CI 1.30, 3.29); with case exclusion, this odds ratio decreased to 1.53 (95% CI 0.83, 2.83). The use of more than 5 imputed datasets did not substantively change the results. Conclusions. Our experience with the National Tuberculosis Surveillance System dataset supports the use of multiple imputation methods in epidemiologic analysis, but also demonstrates that close attention should be paid to the potential impact of missing covariates at each step of the analysis.


2018 ◽  
Vol 7 (1) ◽  
Author(s):  
Bas B.L. Penning de Vries ◽  
Maarten van Smeden ◽  
Rolf H.H. Groenwold

AbstractData mining and machine learning techniques such as classification and regression trees (CART) represent a promising alternative to conventional logistic regression for propensity score estimation. Whereas incomplete data preclude the fitting of a logistic regression on all subjects, CART is appealing in part because some implementations allow for incomplete records to be incorporated in the tree fitting and provide propensity score estimates for all subjects. Based on theoretical considerations, we argue that the automatic handling of missing data by CART may however not be appropriate. Using a series of simulation experiments, we examined the performance of different approaches to handling missing covariate data; (i) applying the CART algorithm directly to the (partially) incomplete data, (ii) complete case analysis, and (iii) multiple imputation. Performance was assessed in terms of bias in estimating exposure-outcome effects among the exposed, standard error, mean squared error and coverage. Applying the CART algorithm directly to incomplete data resulted in bias, even in scenarios where data were missing completely at random. Overall, multiple imputation followed by CART resulted in the best performance. Our study showed that automatic handling of missing data in CART can cause serious bias and does not outperform multiple imputation as a means to account for missing data.


Sign in / Sign up

Export Citation Format

Share Document