scholarly journals Missing data in longitudinal studies: Comparison of multiple imputation methods in a real clinical setting

Author(s):  
Rosalba Rosato ◽  
Eva Pagano ◽  
Silvia Testa ◽  
Paolo Zola ◽  
Daniela di Cuonzo
2018 ◽  
Vol 18 (1) ◽  
Author(s):  
Md Hamidul Huque ◽  
John B. Carlin ◽  
Julie A. Simpson ◽  
Katherine J. Lee

2014 ◽  
Vol 134 ◽  
pp. 23-33 ◽  
Author(s):  
M.P. Gómez-Carracedo ◽  
J.M. Andrade ◽  
P. López-Mahía ◽  
S. Muniategui ◽  
D. Prada

2021 ◽  
Author(s):  
Adrienne D. Woods ◽  
Pamela Davis-Kean ◽  
Max Andrew Halvorson ◽  
Kevin Michael King ◽  
Jessica A. R. Logan ◽  
...  

A common challenge in developmental research is the amount of incomplete and missing data that occurs from respondents failing to complete tasks or questionnaires, as well as from disengaging from the study (i.e., attrition). This missingness can lead to biases in parameter estimates and, hence, in the interpretation of findings. These biases can be addressed through statistical techniques that adjust for missing data, such as multiple imputation. Although this technique is highly effective, it has not been widely adopted by developmental scientists given barriers such as lack of training or misconceptions about imputation methods and instead utilizing default methods within software like listwise deletion. This manuscript is intended to provide practical guidelines for developmental researchers to follow when examining their data for missingness, making decisions about how to handle that missingness, and reporting the extent of missing data biases and specific multiple imputation procedures in publications.


2019 ◽  
Vol 6 (339) ◽  
pp. 73-98
Author(s):  
Małgorzata Aleksandra Misztal

The problem of incomplete data and its implications for drawing valid conclusions from statistical analyses is not related to any particular scientific domain, it arises in economics, sociology, education, behavioural sciences or medicine. Almost all standard statistical methods presume that every object has information on every variable to be included in the analysis and the typical approach to missing data is simply to delete them. However, this leads to ineffective and biased analysis results and is not recommended in the literature. The state of the art technique for handling missing data is multiple imputation. In the paper, some selected multiple imputation methods were taken into account. Special attention was paid to using principal components analysis (PCA) as an imputation method. The goal of the study was to assess the quality of PCA‑based imputations as compared to two other multiple imputation techniques: multivariate imputation by chained equations (MICE) and missForest. The comparison was made by artificially simulating different proportions (10–50%) and mechanisms of missing data using 10 complete data sets from the UCI repository of machine learning databases. Then, missing values were imputed with the use of MICE, missForest and the PCA‑based method (MIPCA). The normalised root mean square error (NRMSE) was calculated as a measure of imputation accuracy. On the basis of the conducted analyses, missForest can be recommended as a multiple imputation method providing the lowest rates of imputation errors for all types of missingness. PCA‑based imputation does not perform well in terms of accuracy.


2010 ◽  
Vol 7 (1) ◽  
Author(s):  
Claudio Quintano ◽  
Rosalia Castellano ◽  
Antonella Rocca

In the field of data quality, imputation is the most used method for handling missing data. The performance of imputation techniques is influenced by various factors, especially when data represent only a sample of population, for example the survey design characteristics. In this paper, we compare the results of different multiple imputation methods in terms of final estimates when outliers occur in a dataset. Consequently, in order to evaluate the influence of outliers on the performance of these methods, the procedure is applied before and after that we have identified and removed them. For this purpose, missing data were simulated on data coming from sample ISTAT annual survey on Small and Medium Enterprises. MAR mechanism is assumed for missing data. The methods are based on the multiple imputation through the Markov Chain Monte Carlo (MCMC), the propensity score and the mixture models. The results highlight the strong influence of data characteristics on final estimates.


2017 ◽  
Author(s):  
Brett K. Beaulieu-Jones ◽  
Daniel R. Lavage ◽  
John W. Snyder ◽  
Jason H. Moore ◽  
Sarah A Pendergrass ◽  
...  

ABSTRACTMissing data is a challenge for all studies; however, this is especially true for electronic health record (EHR) based analyses. Failure to appropriately consider missing data can lead to biased results. Here, we provide detailed procedures for when and how to conduct imputation of EHR data. We demonstrate how the mechanism of missingness can be assessed, evaluate the performance of a variety of imputation methods, and describe some of the most frequent problems that can be encountered. We analyzed clinical lab measures from 602,366 patients in the Geisinger Health System EHR. Using these data, we constructed a representative set of complete cases and assessed the performance of 12 different imputation methods for missing data that was simulated based on 4 mechanisms of missingness. Our results show that several methods including variations of Multivariate Imputation by Chained Equations (MICE) and softImpute consistently imputed missing values with low error; however, only a subset of the MICE methods were suitable for multiple imputation. The analyses described provide an outline of considerations for dealing with missing EHR data, steps that researchers can perform to characterize missingness within their own data, and an evaluation of methods that can be applied to impute clinical data. While the performance of methods may vary between datasets, the process we describe can be generalized to the majority of structured data types that exist in EHRs and all of our methods and code are publicly available.


Author(s):  
Marc J. Lajeunesse

This chapter discusses possible solutions for dealing with partial information and missing data from published studies. These solutions can improve the amount of information extracted from individual studies, and increase the representation of data for meta-analysis. It begins with a description of the mechanisms that generate missing information within studies, followed by a discussion of how gaps of information can influence meta-analysis and the way studies are quantitatively reviewed. It then suggests some practical solutions to recovering missing statistics from published studies. These include statistical acrobatics to convert available information (e.g., t-test) into those that are more useful to compute effect sizes, as well as a heuristic approaches that impute (fill gaps) missing information when pooling effect sizes. Finally, the chapter discusses multiple-imputation methods that account for the uncertainty associated with filling gaps of information when performing meta-analysis.


Sign in / Sign up

Export Citation Format

Share Document