scholarly journals Average effect estimation with dichotomized events when the missing data mechanism is not missing at random

2012 ◽  
pp. 85
Author(s):  
Amy Kwon ◽  
Ren
2016 ◽  
Vol 27 (2) ◽  
pp. 352-363 ◽  
Author(s):  
James C Doidge

Population-based cohort studies are invaluable to health research because of the breadth of data collection over time, and the representativeness of their samples. However, they are especially prone to missing data, which can compromise the validity of analyses when data are not missing at random. Having many waves of data collection presents opportunity for participants’ responsiveness to be observed over time, which may be informative about missing data mechanisms and thus useful as an auxiliary variable. Modern approaches to handling missing data such as multiple imputation and maximum likelihood can be difficult to implement with the large numbers of auxiliary variables and large amounts of non-monotone missing data that occur in cohort studies. Inverse probability-weighting can be easier to implement but conventional wisdom has stated that it cannot be applied to non-monotone missing data. This paper describes two methods of applying inverse probability-weighting to non-monotone missing data, and explores the potential value of including measures of responsiveness in either inverse probability-weighting or multiple imputation. Simulation studies are used to compare methods and demonstrate that responsiveness in longitudinal studies can be used to mitigate bias induced by missing data, even when data are not missing at random.


2020 ◽  
Author(s):  
Matthew Sperrin ◽  
Glen P. Martin

Abstract Background : Within routinely collected health data, missing data for an individual might provide useful information in itself. This occurs, for example, in the case of electronic health records, where the presence or absence of data is informative. While the naive use of missing indicators to try to exploit such information can introduce bias, its use in conjunction with multiple imputation may unlock the potential value of missingness to reduce bias in causal effect estimation, particularly in missing not at random scenarios and where missingness might be associated with unmeasured confounders. Methods: We conducted a simulation study to determine when the use of a missing indicator, combined with multiple imputation, would reduce bias for causal effect estimation, under a range of scenarios including unmeasured variables, missing not at random, and missing at random mechanisms. We use directed acyclic graphs and structural models to elucidate a variety of causal structures of interest. We handled missing data using complete case analysis, and multiple imputation with and without missing indicator terms. Results: We find that multiple imputation combined with a missing indicator gives minimal bias for causal effect estimation in most scenarios. In particular the approach: 1) does not introduce bias in missing (completely) at random scenarios; 2)reduces bias in missing not at random scenarios where the missing mechanism depends on the missing variable itself; and 3) may reduce or increase bias when unmeasured confounding is present. Conclusion : In the presence of missing data, careful use of missing indicators, combined with multiple imputation, can improve causal effect estimation when missingness is informative, and is not detrimental when missingness is at random.


Author(s):  
Jin Hyuk Lee ◽  
J. Charles Huber Jr.

Background: Multiple Imputation (MI) is known as an effective method for handling missing data in public health research. However, it is not clear that the method will be effective when the data contain a high percentage of missing observations on a variable. Methods: Using data from “Predictive Study of Coronary Heart Disease” study, this study examined the effectiveness of multiple imputation in data with 20% missing to 80% missing observations using absolute bias (|bias|) and Root Mean Square Error (RMSE) of MI measured under Missing Completely at Random (MCAR), Missing at Random (MAR), and Not Missing at Random (NMAR) assumptions. Results: The |bias| and RMSE of MI was much smaller than of the results of CCA under all missing mechanisms, especially with a high percentage of missing. In addition, the |bias| and RMSE of MI were consistent regardless of increasing imputation numbers from M=10 to M=50. Moreover, when comparing imputation mechanisms, MCMC method had universally smaller |bias| and RMSE than those of Regression method and Predictive Mean Matching method under all missing mechanisms. Conclusion: As missing percentages become higher, using MI is recommended, because MI produced less biased estimates under all missing mechanisms. However, when large proportions of data are missing, other things need to be considered such as the number of imputations, imputation mechanisms, and missing data mechanisms for proper imputation.


2020 ◽  
Author(s):  
Matthew Sperrin ◽  
Glen P. Martin

Abstract Background: Within routinely collected health data, missing data for an individual might provide useful information in itself. This occurs, for example, in the case of electronic health records, where the presence or absence of data is informative. While the naive use of missing indicators to try to exploit such information can introduce bias, its use in conjunction with multiple imputation may unlock the potential value of missingness to reduce bias in causal effect estimation, particularly in missing not at random scenarios and where missingness might be associated with unmeasured confounders.Methods: We conducted a simulation study to determine when the use of a missing indicator, combined with multiple imputation, would reduce bias for causal effect estimation, under a range of scenarios including unmeasured variables, missing not at random, and missing at random mechanisms. We use directed acyclic graphs and structural models to elucidate a variety of causal structures of interest. We handled missing data using complete case analysis, and multiple imputation with and without missing indicator terms.Results: We find that multiple imputation combined with a missing indicator gives minimal bias for causal effect estimation in most scenarios. It does not introduce bias in missing (completely) at random scenarios, while reducing bias in missing not at random scenarios where the missing mechanism depends on the missing variable itself. The incorporation of a missing indicator can reduce or increase bias when unmeasured confounding is present.Conclusion: In the presence of missing data, careful use of missing indicators, combined with multiple imputation, can improve causal effect estimation when missingness is informative, and is not detrimental when missingness is at random.


Author(s):  
Tra My Pham ◽  
Irene Petersen ◽  
James Carpenter ◽  
Tim Morris

ABSTRACT BackgroundEthnicity is an important factor to be considered in health research because of its association with inequality in disease prevalence and the utilisation of healthcare. Ethnicity recording has been incorporated in primary care electronic health records, and hence is available in large UK primary care databases such as The Health Improvement Network (THIN). However, since primary care data are routinely collected for clinical purposes, a large amount of data that are relevant for research including ethnicity is often missing. A popular approach for missing data is multiple imputation (MI). However, the conventional MI method assuming data are missing at random does not give plausible estimates of the ethnicity distribution in THIN compared to the general UK population. This might be due to the fact that ethnicity data in primary care are likely to be missing not at random. ObjectivesI propose a new MI method, termed ‘weighted multiple imputation’, to deal with data that are missing not at random in categorical variables.MethodsWeighted MI combines MI and probability weights which are calculated using external data sources. Census summary statistics for ethnicity can be used to form weights in weighted MI such that the correct marginal ethnic breakdown is recovered in THIN. I conducted a simulation study to examine weighted MI when ethnicity data are missing not at random. In this simulation study which resembled a THIN dataset, ethnicity was an independent variable in a survival model alongside other covariates. Weighted MI was compared to the conventional MI and other traditional missing data methods including complete case analysis and single imputation.ResultsWhile a small bias was still present in ethnicity coefficient estimates under weighted MI, it was less severe compared to MI assuming missing at random. Complete case analysis and single imputation were inadequate to handle data that are missing not at random in ethnicity.ConclusionsAlthough not a total cure, weighted MI represents a pragmatic approach that has potential applications not only in ethnicity but also in other incomplete categorical health indicators in electronic health records.


2012 ◽  
Vol 3 (1) ◽  
Author(s):  
Christopher H. Rhoads

Policy analysts involved in quantitative research have many options for handling missing data. The method chosen will often greatly influence the substantive policy conclusions that will be drawn from the data. The most frequent methods for handling missing data assume that the data are missing at random (MAR). The current paper notes that an omnibus, nonparametric test of the MAR assumption is impossible using the observed data alone. Nonetheless various purported tests of the missingness mechanism (including tests of MAR) appear in the literature. The current paper clarifies that all of these tests rely on some assumption that cannot be tested from the data. The paper notes that tests of the missingness mechanism are frequently misinterpreted and it clarifies the appropriate interpretation of such tests. Policy analysts are encouraged not to develop the false impression that modern procedures for handling missing data in conjunction with tests of the missingness mechanism provide protection against the ill effects of missing data. Any justification for a particular approach to handling missing data must be come from substantive knowledge of the missingness process, not from the data.


2020 ◽  
Vol 14 (Supplement_1) ◽  
pp. S388-S389
Author(s):  
J Chen ◽  
S Hunter ◽  
K Kisfalvi ◽  
R A Lirio

Abstract Background Missing data is common in IBD trials. Depending on the volume and nature of missing data, it can reduce statistical power for detecting treatment difference, introduce potential bias and invalidate conclusions. Non-responder imputation (NRI), where patients (patients) with missing data are considered treatment failures, is widely used to handle missing data for dichotomous efficacy endpoints in IBD trials. However, it does not consider the mechanisms leading to missing data and can potentially underestimate the treatment effect. We proposed a hybrid (HI) approach combining NRI and multiple imputation (MI) as an alternative to NRI in the analyses of two phase 3 trials of vedolizumab (VDZ) in patients with moderate-to-severe UC – VISIBLE 11 and VARSITY2. Methods VISIBLE 1 and VARSITY assessed efficacy using dichotomous endpoints based on complete Mayo score. Full methodologies reported previously.1,2 Our proposed HI approach is aimed at imputing missing Mayo scores, instead of imputing the missing dichotomous efficacy endpoint. To assess the impact of dropouts for different missing data mechanisms (categorised as ‘missing not at random [MNAR]’ and ‘missing at random [MAR]’, HI was implemented as a potential sensitivity analysis, where dropouts owing to safety or lack of efficacy were imputed using NRI (assuming MNAR) and other missing data were imputed using MI (assuming MAR). For MI, each component of the Mayo score was imputed via a multivariate stepwise approach using a fully conditional specification ordinal logistic method. Missing baseline scores were imputed using baseline characteristics data. Missing scores from each subsequent visit were imputed using all previous visits in a stepwise fashion. Fifty imputation datasets were computed for each component of Mayo score. The complete Mayo score and relevant efficacy endpoints were derived subsequently. The analysis was performed within each imputed dataset to determine treatment difference, 95% CI and p-value, which were then combined via Rubin’s rules3. Results Tables 1 and 2 show a comparison of efficacy in the two studies using the primary NRI analysis vs. the alternative HI approach for handling missing data. Conclusion HI and NRI approaches can provide consistent efficacy analyses in IBD trials. The HI approach can serve as a useful sensitivity analysis to assess the impact of dropouts under different missing data mechanisms and evaluate the robustness of efficacy conclusions. Reference


Sign in / Sign up

Export Citation Format

Share Document