simple imputation
Recently Published Documents


TOTAL DOCUMENTS

16
(FIVE YEARS 2)

H-INDEX

5
(FIVE YEARS 0)

2021 ◽  
Vol 4 (3) ◽  
pp. 62
Author(s):  
Sergio Arciniegas-Alarcón ◽  
Marisol García-Peña ◽  
Camilo Rengifo ◽  
Wojtek J. Krzanowski

We describe imputation strategies resistant to outliers, through modifications of the simple imputation method proposed by Krzanowski and assess their performance. The strategies use a robust singular value decomposition, do not depend on distributional or structural assumptions and have no restrictions as to the pattern or missing data mechanisms. They are tested through the simulation of contamination and unbalance, both in artificially generated matrices and in a matrix of real data from an experiment with genotype-by-environment interaction. Their performance is assessed by means of prediction errors, the squared cosine between matrices, and a quality coefficient of fit between imputations and true values. For small matrices, the best results are obtained by applying robust decomposition directly, while for larger matrices the highest quality is obtained by eliminating the singular values of the imputation equation.


RMD Open ◽  
2021 ◽  
Vol 7 (2) ◽  
pp. e001708
Author(s):  
Nasim A Khan ◽  
Karina D Torralba ◽  
Fawad Aslam

ObjectivesTo analyse the amount, reporting and handling of missing data, approach to intention-to-treat (ITT) principle application and sensitivity analysis utilisation in randomised clinical trials (RCTs) of rheumatoid arthritis (RA). To assess the trend in such reporting 10 years apart (2006 and 2016).MethodsParallel group drug therapy RA RCTs with a clinical primary endpoint.Results176 studies enrolling a median of 160 (IQR 62–339) patients were eligible. In terms of actual analysis: 81 (46%) RCTs conducted ITT, 42 (23.9%) conducted modified ITT while 53 (30.1%) conducted non-ITT analysis. Only 58 of 97 (59.8%) RCTs reporting an ITT analysis actually performed it. The median (IQR) numbers of participants completing the trial and included in analysis for primary outcome were 86% (74%–91%) and 100% (97.1%–100%), respectively. 53 (32.7%) and 65 (40.1%) RCTs had >20% and 10%–20% missing primary outcome data, respectively. Missing data handling was unreported by 58 of 171 (33.9%) RCTs. When reported, vast majority used simple imputation methods. No significant trend towards improved reporting was seen between 2006 and 2016. Sensitivity analysis numerically improved from 2006 to 2016 (14.7% vs 21.4%).ConclusionsThere is significant discrepancy in the reported and the actual performed analysis in RA drug therapy RCTs. Nearly one-third of RCTs had >20% missing data. The reporting and methods of missing data handling remain inadequate with high usage of non-preferred simple imputation methods. Sensitivity analysis utilisation was low. No trend towards better missing data reporting and handling was seen.


2019 ◽  
Vol 135 (1) ◽  
pp. 124-131 ◽  
Author(s):  
Sumedha Gupta ◽  
Alex Cohen ◽  
Evan M. Lowder ◽  
Bradley R. Ray

Objectives: Understanding the scope of the current opioid epidemic requires accurate counts of the number of opioid-involved drug overdose deaths. Given known errors and limitations in the reporting of these deaths, several studies have used statistical methods to develop estimates of the true number of opioid-involved overdose deaths. This study validates these procedures using a detailed county-level database of linked toxicology and vital records data. Methods: We extracted and linked toxicology and vital records data from Marion County, Indiana (Indianapolis), during a 6-year period (2011-2016). Using toxicology data as a criterion measure, we tested the validity of multiple imputation procedures, including the Ruhm regression-based imputation approach for correcting the number of opioid-involved overdose deaths. Results: Estimates deviated from true opioid-involved overdose deaths by 3% and increased in accuracy during the study period (2011-2016). For example, in 2016, 231 opioid-involved overdose deaths were noted in the toxicology data, whereas the corresponding imputed estimate was 233 opioid-involved overdose deaths. A simple imputation approach, based on the share of opioid-involved overdose deaths among all drug overdose deaths for which the death certificate specified ≥1 drug, deviated from true opioid-involved overdose deaths by ±5%. Conclusions: Commonly used imputation procedures produced estimates of the number of opioid-involved overdose deaths that are similar to the true number of opioid-involved overdose deaths obtained from toxicology data. Although future studies should examine whether these results extend beyond the geographic area covered in our data set, our findings support the continued use of these imputation procedures to quantify the extent of the opioid epidemic.


2018 ◽  
Author(s):  
Jean Gaudart ◽  
Pascal Adalian ◽  
George Leonetti

AbstractIntroductionIn many studies, covariates are not always fully observed because of missing data process. Usually, subjects with missing data are excluded from the analysis but the number of covariates can be greater than the size of the sample when the number of removed subjects is high. Subjective selection or imputation procedures are used but this leads to biased or powerless models.The aim of our study was to develop a method based on the selection of the nearest covariate to the centroid of a homogeneous cluster of covariates. We applied this method to a forensic medicine data set to estimate the age of aborted fetuses.AnalysisMethodsWe measured 46 biometric covariates on 50 aborted fetuses. But the covariates were complete for only 18 fetuses.First, to obtain homogeneous clusters of covariates we used a hierarchical cluster analysis.Second, for each obtained cluster we selected the nearest covariate to the centroid of the cluster, maximizing the sum of correlations (the centroid criterion).Third, with the covariate selected this way, the sample size was sufficient to compute a classical linear regression model.We have shown the almost sure convergence of the centroid criterion and simulations were performed to build its empirical distribution.We compared our method to a subjective deletion method, two simple imputation methods and to the multiple imputation method.ResultsThe hierarchical cluster analysis built 2 clusters of covariates and 6 remaining covariates. After the selection of the nearest covariate to the centroid of each cluster, we computed a stepwise linear regression model. The model was adequate (R2=90.02%) and the cross-validation showed low prediction errors (2.23 10−3).The empirical distribution of the criterion provided empirical mean (31.91) and median (32.07) close to the theoretical value (32.03).The comparisons showed that deletion and simple imputation methods provided models of inferior quality than the multiple imputation method and the centroid method.ConclusionWhen the number of continuous covariates is greater than the sample size because of missing process, the usual procedures are biased. Our selection procedure based on the centroid criterion is a valid alternative to compose a set of predictors.


2017 ◽  
Vol 10 (04) ◽  
pp. 773-779
Author(s):  
V.B. Kamble ◽  
S.N. Deshmukh

Presence of missing values in the dataset leads to difficult for data analysis in data mining task. In this research work, student dataset is taken contains marks of four different subjects in engineering college. Mean, Mode, Median Imputation were used to deal with challenges of incomplete data. By using MSE and RMSE on dataset using with proposed Method and imputation methods like Mean, Mode, and Median Imputation on the dataset and found out to be values of Mean Squared Error and Root Mean Squared Error for the dataset. Accuracy also found out to be using Proposed Method with Imputation Technique. Experimental observation it was found that, MSE and RMSE gradually decreases when size of the databases is gradually increases by using proposed Method. Also MSE and RMSE gradually increase when size of the databases is gradually increases by using simple imputation technique. Accuracy is also increases with increases size of the databases.


2016 ◽  
Vol 78 (2) ◽  
pp. 319-342 ◽  
Author(s):  
Stefanie A. Wind ◽  
Yogendra J. Patil

Recent research has explored the use of models adapted from Mokken scale analysis as a nonparametric approach to evaluating rating quality in educational performance assessments. A potential limiting factor to the widespread use of these techniques is the requirement for complete data, as practical constraints in operational assessment systems often limit the use of complete rating designs. In order to address this challenge, this study explores the use of missing data imputation techniques and their impact on Mokken-based rating quality indicators related to rater monotonicity, rater scalability, and invariant rater ordering. Simulated data and real data from a rater-mediated writing assessment were modified to reflect varying levels of missingness, and four imputation techniques were used to impute missing ratings. Overall, the results indicated that simple imputation techniques based on rater and student means result in generally accurate recovery of rater monotonicity indices and rater scalability coefficients. However, discrepancies between violations of invariant rater ordering in the original and imputed data are somewhat unpredictable across imputation methods. Implications for research and practice are discussed.


2014 ◽  
Vol 51 (2) ◽  
pp. 75-88 ◽  
Author(s):  
Sergio Arciniegas-Alarcón ◽  
Marisol García-Peña ◽  
Wojtek Janusz Krzanowski ◽  
Carlos Tadeu dos Santos Dias

Abstract A common problem in multi-environment trials arises when some genotypeby- environment combinations are missing. In Arciniegas-Alarcón et al. (2010) we outlined a method of data imputation to estimate the missing values, the computational algorithm for which was a mixture of regression and lower-rank approximation of a matrix based on its singular value decomposition (SVD). In the present paper we provide two extensions to this methodology, by including weights chosen by cross-validation and allowing multiple as well as simple imputation. The three methods are assessed and compared in a simulation study, using a complete set of real data in which values are deleted randomly at different rates. The quality of the imputations is evaluated using three measures: the Procrustes statistic, the squared correlation between matrices and the normalised root mean squared error between these estimates and the true observed values. None of the methods makes any distributional or structural assumptions, and all of them can be used for any pattern or mechanism of the missing values.


ISRN Nursing ◽  
2011 ◽  
Vol 2011 ◽  
pp. 1-4 ◽  
Author(s):  
Wilma M. Hopman ◽  
Margaret B. Harrison ◽  
Meg Carley ◽  
Elizabeth G. VanDenKerkhof

Background. Missing data are a significant problem in health-related quality of life (HRQOL) research. We evaluated two imputation approaches: missing data estimation (MDE) and assignment of mean score (AMS). Methods. HRQOL data were collected using the Medical Outcomes Trust SF-12. Missing data were estimated using both approaches, summary statistics were produced for both, and results were compared using intraclass correlations (ICC). Results. Missing data were imputed for 21 participants. Mean values were similar, with ICC >.99 within both the Physical Component Summary and the Mental Component Summary when comparing the two methodologies. When imputed data were added into the full study sample, mean scores were identical regardless of methodology. Conclusion. Results support the use of a practical and simple imputation strategy of replacing missing values with the mean of the sample in cross-sectional studies when less than half of the required items of the SF-12 components are missing.


2011 ◽  
Vol 44 (2) ◽  
pp. 516-531 ◽  
Author(s):  
Damazo T. Kadengye ◽  
Wilfried Cools ◽  
Eva Ceulemans ◽  
Wim Van den Noortgate

Sign in / Sign up

Export Citation Format

Share Document