simple imputation Latest Research Papers

Techniques for Robust Imputation in Incomplete Two-Way Tables

Applied System Innovation ◽

10.3390/asi4030062 ◽

2021 ◽

Vol 4 (3) ◽

pp. 62

Author(s):

Sergio Arciniegas-Alarcón ◽

Marisol García-Peña ◽

Camilo Rengifo ◽

Wojtek J. Krzanowski

Keyword(s):

Real Data ◽

Singular Values ◽

Genotype By Environment Interaction ◽

Environment Interaction ◽

Prediction Errors ◽

Genotype By Environment ◽

Simple Imputation ◽

Quality Coefficient ◽

Value Decomposition ◽

True Values

We describe imputation strategies resistant to outliers, through modifications of the simple imputation method proposed by Krzanowski and assess their performance. The strategies use a robust singular value decomposition, do not depend on distributional or structural assumptions and have no restrictions as to the pattern or missing data mechanisms. They are tested through the simulation of contamination and unbalance, both in artificially generated matrices and in a matrix of real data from an experiment with genotype-by-environment interaction. Their performance is assessed by means of prediction errors, the squared cosine between matrices, and a quality coefficient of fit between imputations and true values. For small matrices, the best results are obtained by applying robust decomposition directly, while for larger matrices the highest quality is obtained by eliminating the singular values of the imputation equation.

Download Full-text

Missing data in randomised controlled trials of rheumatoid arthritis drug therapy are substantial and handled inappropriately

RMD Open ◽

10.1136/rmdopen-2021-001708 ◽

2021 ◽

Vol 7 (2) ◽

pp. e001708

Author(s):

Nasim A Khan ◽

Karina D Torralba ◽

Fawad Aslam

Keyword(s):

Rheumatoid Arthritis ◽

Sensitivity Analysis ◽

Drug Therapy ◽

Missing Data ◽

Primary Outcome ◽

Outcome Data ◽

Data Handling ◽

Significant Discrepancy ◽

Imputation Methods ◽

Simple Imputation

ObjectivesTo analyse the amount, reporting and handling of missing data, approach to intention-to-treat (ITT) principle application and sensitivity analysis utilisation in randomised clinical trials (RCTs) of rheumatoid arthritis (RA). To assess the trend in such reporting 10 years apart (2006 and 2016).MethodsParallel group drug therapy RA RCTs with a clinical primary endpoint.Results176 studies enrolling a median of 160 (IQR 62–339) patients were eligible. In terms of actual analysis: 81 (46%) RCTs conducted ITT, 42 (23.9%) conducted modified ITT while 53 (30.1%) conducted non-ITT analysis. Only 58 of 97 (59.8%) RCTs reporting an ITT analysis actually performed it. The median (IQR) numbers of participants completing the trial and included in analysis for primary outcome were 86% (74%–91%) and 100% (97.1%–100%), respectively. 53 (32.7%) and 65 (40.1%) RCTs had >20% and 10%–20% missing primary outcome data, respectively. Missing data handling was unreported by 58 of 171 (33.9%) RCTs. When reported, vast majority used simple imputation methods. No significant trend towards improved reporting was seen between 2006 and 2016. Sensitivity analysis numerically improved from 2006 to 2016 (14.7% vs 21.4%).ConclusionsThere is significant discrepancy in the reported and the actual performed analysis in RA drug therapy RCTs. Nearly one-third of RCTs had >20% missing data. The reporting and methods of missing data handling remain inadequate with high usage of non-preferred simple imputation methods. Sensitivity analysis utilisation was low. No trend towards better missing data reporting and handling was seen.

Download Full-text

Validating Imputation Procedures to Calculate Corrected Opioid-Involved Overdose Deaths, Marion County, Indiana, 2011-2016

Public Health Reports ◽

10.1177/0033354919890022 ◽

2019 ◽

Vol 135 (1) ◽

pp. 124-131 ◽

Cited By ~ 1

Author(s):

Sumedha Gupta ◽

Alex Cohen ◽

Evan M. Lowder ◽

Bradley R. Ray

Keyword(s):

Drug Overdose ◽

Geographic Area ◽

Opioid Epidemic ◽

Data Set ◽

Marion County ◽

True Number ◽

Vital Records ◽

Simple Imputation ◽

Overdose Deaths ◽

Imputation Approach

Objectives: Understanding the scope of the current opioid epidemic requires accurate counts of the number of opioid-involved drug overdose deaths. Given known errors and limitations in the reporting of these deaths, several studies have used statistical methods to develop estimates of the true number of opioid-involved overdose deaths. This study validates these procedures using a detailed county-level database of linked toxicology and vital records data. Methods: We extracted and linked toxicology and vital records data from Marion County, Indiana (Indianapolis), during a 6-year period (2011-2016). Using toxicology data as a criterion measure, we tested the validity of multiple imputation procedures, including the Ruhm regression-based imputation approach for correcting the number of opioid-involved overdose deaths. Results: Estimates deviated from true opioid-involved overdose deaths by 3% and increased in accuracy during the study period (2011-2016). For example, in 2016, 231 opioid-involved overdose deaths were noted in the toxicology data, whereas the corresponding imputed estimate was 233 opioid-involved overdose deaths. A simple imputation approach, based on the share of opioid-involved overdose deaths among all drug overdose deaths for which the death certificate specified ≥1 drug, deviated from true opioid-involved overdose deaths by ±5%. Conclusions: Commonly used imputation procedures produced estimates of the number of opioid-involved overdose deaths that are similar to the true number of opioid-involved overdose deaths obtained from toxicology data. Although future studies should examine whether these results extend beyond the geographic area covered in our data set, our findings support the continued use of these imputation procedures to quantify the extent of the opioid epidemic.

Download Full-text

Predictive model in the presence of missing data: the centroid criterion for variable selection

10.1101/420943 ◽

2018 ◽

Author(s):

Jean Gaudart ◽

Pascal Adalian ◽

George Leonetti

Keyword(s):

Cluster Analysis ◽

Missing Data ◽

Linear Regression ◽

Empirical Distribution ◽

Hierarchical Cluster ◽

Imputation Method ◽

Imputation Methods ◽

Simple Imputation ◽

Multiple Imputation Method ◽

Selection Of

AbstractIntroductionIn many studies, covariates are not always fully observed because of missing data process. Usually, subjects with missing data are excluded from the analysis but the number of covariates can be greater than the size of the sample when the number of removed subjects is high. Subjective selection or imputation procedures are used but this leads to biased or powerless models.The aim of our study was to develop a method based on the selection of the nearest covariate to the centroid of a homogeneous cluster of covariates. We applied this method to a forensic medicine data set to estimate the age of aborted fetuses.AnalysisMethodsWe measured 46 biometric covariates on 50 aborted fetuses. But the covariates were complete for only 18 fetuses.First, to obtain homogeneous clusters of covariates we used a hierarchical cluster analysis.Second, for each obtained cluster we selected the nearest covariate to the centroid of the cluster, maximizing the sum of correlations (the centroid criterion).Third, with the covariate selected this way, the sample size was sufficient to compute a classical linear regression model.We have shown the almost sure convergence of the centroid criterion and simulations were performed to build its empirical distribution.We compared our method to a subjective deletion method, two simple imputation methods and to the multiple imputation method.ResultsThe hierarchical cluster analysis built 2 clusters of covariates and 6 remaining covariates. After the selection of the nearest covariate to the centroid of each cluster, we computed a stepwise linear regression model. The model was adequate (R2=90.02%) and the cross-validation showed low prediction errors (2.23 10−3).The empirical distribution of the criterion provided empirical mean (31.91) and median (32.07) close to the theoretical value (32.03).The comparisons showed that deletion and simple imputation methods provided models of inferior quality than the multiple imputation method and the centroid method.ConclusionWhen the number of continuous covariates is greater than the sample size because of missing process, the usual procedures are biased. Our selection procedure based on the centroid criterion is a valid alternative to compose a set of predictors.

Download Full-text

Comparision Between Accuracy and MSE,RMSE by Using Proposed Method with Imputation Technique

Oriental journal of computer science and technology ◽

10.13005/ojcst/10.04.11 ◽

2017 ◽

Vol 10 (04) ◽

pp. 773-779

Author(s):

V.B. Kamble ◽

S.N. Deshmukh

Keyword(s):

Data Mining ◽

Data Analysis ◽

Incomplete Data ◽

Missing Values ◽

Mean Squared Error ◽

Research Work ◽

Imputation Methods ◽

Squared Error ◽

Simple Imputation ◽

Work Student

Presence of missing values in the dataset leads to difficult for data analysis in data mining task. In this research work, student dataset is taken contains marks of four different subjects in engineering college. Mean, Mode, Median Imputation were used to deal with challenges of incomplete data. By using MSE and RMSE on dataset using with proposed Method and imputation methods like Mean, Mode, and Median Imputation on the dataset and found out to be values of Mean Squared Error and Root Mean Squared Error for the dataset. Accuracy also found out to be using Proposed Method with Imputation Technique. Experimental observation it was found that, MSE and RMSE gradually decreases when size of the databases is gradually increases by using proposed Method. Also MSE and RMSE gradually increase when size of the databases is gradually increases by using simple imputation technique. Accuracy is also increases with increases size of the databases.

Download Full-text

Exploring Incomplete Rating Designs With Mokken Scale Analysis

Educational and Psychological Measurement ◽

10.1177/0013164416675393 ◽

2016 ◽

Vol 78 (2) ◽

pp. 319-342 ◽

Cited By ~ 2

Author(s):

Stefanie A. Wind ◽

Yogendra J. Patil

Keyword(s):

Writing Assessment ◽

Simulated Data ◽

Real Data ◽

Performance Assessments ◽

Limiting Factor ◽

Scale Analysis ◽

Missing Data Imputation ◽

Mokken Scale Analysis ◽

Nonparametric Approach ◽

Simple Imputation

Recent research has explored the use of models adapted from Mokken scale analysis as a nonparametric approach to evaluating rating quality in educational performance assessments. A potential limiting factor to the widespread use of these techniques is the requirement for complete data, as practical constraints in operational assessment systems often limit the use of complete rating designs. In order to address this challenge, this study explores the use of missing data imputation techniques and their impact on Mokken-based rating quality indicators related to rater monotonicity, rater scalability, and invariant rater ordering. Simulated data and real data from a rater-mediated writing assessment were modified to reflect varying levels of missingness, and four imputation techniques were used to impute missing ratings. Overall, the results indicated that simple imputation techniques based on rater and student means result in generally accurate recovery of rater monotonicity indices and rater scalability coefficients. However, discrepancies between violations of invariant rater ordering in the original and imputed data are somewhat unpredictable across imputation methods. Implications for research and practice are discussed.

Download Full-text

Application of Simple Imputation Techniques for Missing Pairwise Contrasts from Multi-Arm Trials when Using Frequentist Network Meta Analysis

Value in Health ◽

10.1016/j.jval.2015.09.2733 ◽

2015 ◽

Vol 18 (7) ◽

pp. A721

Author(s):

H Petto ◽

A Brnabic ◽

Z Kadziola ◽

M Belger

Keyword(s):

Meta Analysis ◽

Simple Imputation ◽

Pairwise Contrasts

Download Full-text

An alternative methodology for imputing missing data in trials with genotype-by-environment interaction: some new aspects

Biometrical Letters ◽

10.2478/bile-2014-0006 ◽

2014 ◽

Vol 51 (2) ◽

pp. 75-88 ◽

Cited By ~ 3

Author(s):

Sergio Arciniegas-Alarcón ◽

Marisol García-Peña ◽

Wojtek Janusz Krzanowski ◽

Carlos Tadeu dos Santos Dias

Keyword(s):

Missing Values ◽

Mean Squared Error ◽

Real Data ◽

Genotype By Environment Interaction ◽

Environment Interaction ◽

Simple Imputation ◽

Alternative Methodology ◽

Complete Set ◽

Rank Approximation ◽

Value Decomposition

Abstract A common problem in multi-environment trials arises when some genotypeby- environment combinations are missing. In Arciniegas-Alarcón et al. (2010) we outlined a method of data imputation to estimate the missing values, the computational algorithm for which was a mixture of regression and lower-rank approximation of a matrix based on its singular value decomposition (SVD). In the present paper we provide two extensions to this methodology, by including weights chosen by cross-validation and allowing multiple as well as simple imputation. The three methods are assessed and compared in a simulation study, using a complete set of real data in which values are deleted randomly at different rates. The quality of the imputations is evaluated using three measures: the Procrustes statistic, the squared correlation between matrices and the normalised root mean squared error between these estimates and the true observed values. None of the methods makes any distributional or structural assumptions, and all of them can be used for any pattern or mechanism of the missing values.

Download Full-text

Additional Support for Simple Imputation of Missing Quality of Life Data in Nursing Research

ISRN Nursing ◽

10.5402/2011/752320 ◽

2011 ◽

Vol 2011 ◽

pp. 1-4 ◽

Cited By ~ 1

Author(s):

Wilma M. Hopman ◽

Margaret B. Harrison ◽

Meg Carley ◽

Elizabeth G. VanDenKerkhof

Keyword(s):

Quality Of Life ◽

Missing Data ◽

Missing Values ◽

Cross Sectional ◽

Sample Mean ◽

Mean Values ◽

Full Study ◽

Simple Imputation ◽

Additional Support

Background. Missing data are a significant problem in health-related quality of life (HRQOL) research. We evaluated two imputation approaches: missing data estimation (MDE) and assignment of mean score (AMS). Methods. HRQOL data were collected using the Medical Outcomes Trust SF-12. Missing data were estimated using both approaches, summary statistics were produced for both, and results were compared using intraclass correlations (ICC). Results. Missing data were imputed for 21 participants. Mean values were similar, with ICC >.99 within both the Physical Component Summary and the Mental Component Summary when comparing the two methodologies. When imputed data were added into the full study sample, mean scores were identical regardless of methodology. Conclusion. Results support the use of a practical and simple imputation strategy of replacing missing values with the mean of the sample in cross-sectional studies when less than half of the required items of the SF-12 components are missing.

Download Full-text

Simple imputation methods versus direct likelihood analysis for missing item scores in multilevel educational data

Behavior Research Methods ◽

10.3758/s13428-011-0157-x ◽

2011 ◽

Vol 44 (2) ◽

pp. 516-531 ◽

Cited By ~ 7

Author(s):

Damazo T. Kadengye ◽

Wilfried Cools ◽

Eva Ceulemans ◽

Wim Van den Noortgate

Keyword(s):

Imputation Methods ◽

Likelihood Analysis ◽

Simple Imputation ◽

Item Scores ◽

Missing Item

Download Full-text

simple imputation
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Techniques for Robust Imputation in Incomplete Two-Way Tables

Missing data in randomised controlled trials of rheumatoid arthritis drug therapy are substantial and handled inappropriately

Validating Imputation Procedures to Calculate Corrected Opioid-Involved Overdose Deaths, Marion County, Indiana, 2011-2016

Predictive model in the presence of missing data: the centroid criterion for variable selection

Comparision Between Accuracy and MSE,RMSE by Using Proposed Method with Imputation Technique

Exploring Incomplete Rating Designs With Mokken Scale Analysis

Application of Simple Imputation Techniques for Missing Pairwise Contrasts from Multi-Arm Trials when Using Frequentist Network Meta Analysis

An alternative methodology for imputing missing data in trials with genotype-by-environment interaction: some new aspects

Additional Support for Simple Imputation of Missing Quality of Life Data in Nursing Research

Simple imputation methods versus direct likelihood analysis for missing item scores in multilevel educational data

Export Citation Format

simple imputationRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Techniques for Robust Imputation in Incomplete Two-Way Tables

Missing data in randomised controlled trials of rheumatoid arthritis drug therapy are substantial and handled inappropriately

Validating Imputation Procedures to Calculate Corrected Opioid-Involved Overdose Deaths, Marion County, Indiana, 2011-2016

Predictive model in the presence of missing data: the centroid criterion for variable selection

Comparision Between Accuracy and MSE,RMSE by Using Proposed Method with Imputation Technique

Exploring Incomplete Rating Designs With Mokken Scale Analysis

Application of Simple Imputation Techniques for Missing Pairwise Contrasts from Multi-Arm Trials when Using Frequentist Network Meta Analysis

An alternative methodology for imputing missing data in trials with genotype-by-environment interaction: some new aspects

Additional Support for Simple Imputation of Missing Quality of Life Data in Nursing Research

Simple imputation methods versus direct likelihood analysis for missing item scores in multilevel educational data

simple imputation
Recently Published Documents