Average effect estimation with dichotomized events when the missing data mechanism is not missing at random

Population-based cohort studies are invaluable to health research because of the breadth of data collection over time, and the representativeness of their samples. However, they are especially prone to missing data, which can compromise the validity of analyses when data are not missing at random. Having many waves of data collection presents opportunity for participants’ responsiveness to be observed over time, which may be informative about missing data mechanisms and thus useful as an auxiliary variable. Modern approaches to handling missing data such as multiple imputation and maximum likelihood can be difficult to implement with the large numbers of auxiliary variables and large amounts of non-monotone missing data that occur in cohort studies. Inverse probability-weighting can be easier to implement but conventional wisdom has stated that it cannot be applied to non-monotone missing data. This paper describes two methods of applying inverse probability-weighting to non-monotone missing data, and explores the potential value of including measures of responsiveness in either inverse probability-weighting or multiple imputation. Simulation studies are used to compare methods and demonstrate that responsiveness in longitudinal studies can be used to mitigate bias induced by missing data, even when data are not missing at random.

Download Full-text

Multiple Imputation with Missing Indicators as Proxies for Unmeasured Variables: Simulation Study

10.21203/rs.3.rs-24268/v3 ◽

2020 ◽

Author(s):

Matthew Sperrin ◽

Glen P. Martin

Keyword(s):

Missing Data ◽

Multiple Imputation ◽

Simulation Study ◽

Causal Effect ◽

Missing At Random ◽

Directed Acyclic Graphs ◽

Missing Not At Random ◽

Routinely Collected Health Data ◽

Effect Estimation ◽

Minimal Bias

Abstract Background : Within routinely collected health data, missing data for an individual might provide useful information in itself. This occurs, for example, in the case of electronic health records, where the presence or absence of data is informative. While the naive use of missing indicators to try to exploit such information can introduce bias, its use in conjunction with multiple imputation may unlock the potential value of missingness to reduce bias in causal effect estimation, particularly in missing not at random scenarios and where missingness might be associated with unmeasured confounders. Methods: We conducted a simulation study to determine when the use of a missing indicator, combined with multiple imputation, would reduce bias for causal effect estimation, under a range of scenarios including unmeasured variables, missing not at random, and missing at random mechanisms. We use directed acyclic graphs and structural models to elucidate a variety of causal structures of interest. We handled missing data using complete case analysis, and multiple imputation with and without missing indicator terms. Results: We find that multiple imputation combined with a missing indicator gives minimal bias for causal effect estimation in most scenarios. In particular the approach: 1) does not introduce bias in missing (completely) at random scenarios; 2)reduces bias in missing not at random scenarios where the missing mechanism depends on the missing variable itself; and 3) may reduce or increase bias when unmeasured confounding is present. Conclusion : In the presence of missing data, careful use of missing indicators, combined with multiple imputation, can improve causal effect estimation when missingness is informative, and is not detrimental when missingness is at random.

Download Full-text

Missing Data in the American College of Surgeons National Surgical Quality Improvement Program Are Not Missing at Random: Implications and Potential Impact on Quality Assessments

Journal of the American College of Surgeons ◽

10.1016/j.jamcollsurg.2009.10.021 ◽

2010 ◽

Vol 210 (2) ◽

pp. 125-139.e2 ◽

Cited By ~ 83

Author(s):

Barton H. Hamilton ◽

Clifford Y. Ko ◽

Karen Richards ◽

Bruce Lee Hall

Keyword(s):

Quality Improvement ◽

Missing Data ◽

Missing At Random ◽

Quality Improvement Program ◽

Improvement Program ◽

Surgical Quality ◽

American College Of Surgeons ◽

Potential Impact ◽

Not Missing At Random ◽

Quality Assessments

Download Full-text

Evaluation of Multiple Imputation with Large Proportions of Missing Data: How Much Is Too Much?

Iranian Journal of Public Health ◽

10.18502/ijph.v50i7.6626 ◽

2021 ◽

Author(s):

Jin Hyuk Lee ◽

J. Charles Huber Jr.

Keyword(s):

Missing Data ◽

Multiple Imputation ◽

Missing At Random ◽

Public Health Research ◽

Missing Observations ◽

Predictive Mean Matching ◽

Absolute Bias ◽

Using Data ◽

Disease Study ◽

Not Missing At Random

Background: Multiple Imputation (MI) is known as an effective method for handling missing data in public health research. However, it is not clear that the method will be effective when the data contain a high percentage of missing observations on a variable. Methods: Using data from “Predictive Study of Coronary Heart Disease” study, this study examined the effectiveness of multiple imputation in data with 20% missing to 80% missing observations using absolute bias (|bias|) and Root Mean Square Error (RMSE) of MI measured under Missing Completely at Random (MCAR), Missing at Random (MAR), and Not Missing at Random (NMAR) assumptions. Results: The |bias| and RMSE of MI was much smaller than of the results of CCA under all missing mechanisms, especially with a high percentage of missing. In addition, the |bias| and RMSE of MI were consistent regardless of increasing imputation numbers from M=10 to M=50. Moreover, when comparing imputation mechanisms, MCMC method had universally smaller |bias| and RMSE than those of Regression method and Predictive Mean Matching method under all missing mechanisms. Conclusion: As missing percentages become higher, using MI is recommended, because MI produced less biased estimates under all missing mechanisms. However, when large proportions of data are missing, other things need to be considered such as the number of imputations, imputation mechanisms, and missing data mechanisms for proper imputation.

Download Full-text

Multiple Imputation with Missing Indicators as Proxies for Unmeasured Variables: Simulation Study

10.21203/rs.3.rs-24268/v2 ◽

2020 ◽

Author(s):

Matthew Sperrin ◽

Glen P. Martin

Keyword(s):

Missing Data ◽

Multiple Imputation ◽

Simulation Study ◽

Causal Effect ◽

Missing At Random ◽

Directed Acyclic Graphs ◽

Missing Not At Random ◽

Routinely Collected Health Data ◽

Effect Estimation ◽

Minimal Bias

Abstract Background: Within routinely collected health data, missing data for an individual might provide useful information in itself. This occurs, for example, in the case of electronic health records, where the presence or absence of data is informative. While the naive use of missing indicators to try to exploit such information can introduce bias, its use in conjunction with multiple imputation may unlock the potential value of missingness to reduce bias in causal effect estimation, particularly in missing not at random scenarios and where missingness might be associated with unmeasured confounders.Methods: We conducted a simulation study to determine when the use of a missing indicator, combined with multiple imputation, would reduce bias for causal effect estimation, under a range of scenarios including unmeasured variables, missing not at random, and missing at random mechanisms. We use directed acyclic graphs and structural models to elucidate a variety of causal structures of interest. We handled missing data using complete case analysis, and multiple imputation with and without missing indicator terms.Results: We find that multiple imputation combined with a missing indicator gives minimal bias for causal effect estimation in most scenarios. It does not introduce bias in missing (completely) at random scenarios, while reducing bias in missing not at random scenarios where the missing mechanism depends on the missing variable itself. The incorporation of a missing indicator can reduce or increase bias when unmeasured confounding is present.Conclusion: In the presence of missing data, careful use of missing indicators, combined with multiple imputation, can improve causal effect estimation when missingness is informative, and is not detrimental when missingness is at random.

Download Full-text

PMD4 WHEN CAN MISSING DATA BE CONSIDERED MISSING AT RANDOM (MAR) IN SUBSTANCE ABUSE TREATMENT OUTCOMES RESEARCH?

Value in Health ◽

10.1016/s1098-3015(10)61402-7 ◽

2002 ◽

Vol 5 (6) ◽

pp. 530

Author(s):

JR Ciesla ◽

SF Spear

Keyword(s):

Substance Abuse ◽

Missing Data ◽

Substance Abuse Treatment ◽

Treatment Outcomes ◽

Outcomes Research ◽

Missing At Random ◽

Abuse Treatment

Download Full-text

Weighted multiple imputation of ethnicity data that are missing not at random in primary care databases

International Journal for Population Data Science ◽

10.23889/ijpds.v1i1.54 ◽

2017 ◽

Vol 1 (1) ◽

Author(s):

Tra My Pham ◽

Irene Petersen ◽

James Carpenter ◽

Tim Morris

Keyword(s):

Primary Care ◽

Missing Data ◽

Multiple Imputation ◽

Simulation Study ◽

Case Analysis ◽

Missing At Random ◽

Complete Case ◽

Missing Not At Random ◽

Health Records ◽

Ethnicity Data

ABSTRACT BackgroundEthnicity is an important factor to be considered in health research because of its association with inequality in disease prevalence and the utilisation of healthcare. Ethnicity recording has been incorporated in primary care electronic health records, and hence is available in large UK primary care databases such as The Health Improvement Network (THIN). However, since primary care data are routinely collected for clinical purposes, a large amount of data that are relevant for research including ethnicity is often missing. A popular approach for missing data is multiple imputation (MI). However, the conventional MI method assuming data are missing at random does not give plausible estimates of the ethnicity distribution in THIN compared to the general UK population. This might be due to the fact that ethnicity data in primary care are likely to be missing not at random. ObjectivesI propose a new MI method, termed ‘weighted multiple imputation’, to deal with data that are missing not at random in categorical variables.MethodsWeighted MI combines MI and probability weights which are calculated using external data sources. Census summary statistics for ethnicity can be used to form weights in weighted MI such that the correct marginal ethnic breakdown is recovered in THIN. I conducted a simulation study to examine weighted MI when ethnicity data are missing not at random. In this simulation study which resembled a THIN dataset, ethnicity was an independent variable in a survival model alongside other covariates. Weighted MI was compared to the conventional MI and other traditional missing data methods including complete case analysis and single imputation.ResultsWhile a small bias was still present in ethnicity coefficient estimates under weighted MI, it was less severe compared to MI assuming missing at random. Complete case analysis and single imputation were inadequate to handle data that are missing not at random in ethnicity.ConclusionsAlthough not a total cure, weighted MI represents a pragmatic approach that has potential applications not only in ethnicity but also in other incomplete categorical health indicators in electronic health records.

Download Full-text

Problems with Tests of the Missingness Mechanism in Quantitative Policy Studies

Statistics Politics and Policy ◽

10.1515/2151-7509.1012 ◽

2012 ◽

Vol 3 (1) ◽

Cited By ~ 4

Author(s):

Christopher H. Rhoads

Keyword(s):

Missing Data ◽

Quantitative Research ◽

Missing At Random ◽

Nonparametric Test ◽

Current Paper ◽

Policy Studies ◽

Missingness Mechanism ◽

Policy Analysts

Policy analysts involved in quantitative research have many options for handling missing data. The method chosen will often greatly influence the substantive policy conclusions that will be drawn from the data. The most frequent methods for handling missing data assume that the data are missing at random (MAR). The current paper notes that an omnibus, nonparametric test of the MAR assumption is impossible using the observed data alone. Nonetheless various purported tests of the missingness mechanism (including tests of MAR) appear in the literature. The current paper clarifies that all of these tests rely on some assumption that cannot be tested from the data. The paper notes that tests of the missingness mechanism are frequently misinterpreted and it clarifies the appropriate interpretation of such tests. Policy analysts are encouraged not to develop the false impression that modern procedures for handling missing data in conjunction with tests of the missingness mechanism provide protection against the ill effects of missing data. Any justification for a particular approach to handling missing data must be come from substantive knowledge of the missingness process, not from the data.

Download Full-text

Small area estimation under informative sampling and not missing at random non‐response

Journal of the Royal Statistical Society Series A (Statistics in Society) ◽

10.1111/rssa.12362 ◽

2018 ◽

Vol 181 (4) ◽

pp. 981-1008 ◽

Cited By ~ 2

Author(s):

Michael Sverchkov ◽

Danny Pfeffermann

Keyword(s):

Small Area ◽

Small Area Estimation ◽

Missing At Random ◽

Area Estimation ◽

Informative Sampling ◽

Not Missing At Random

Download Full-text

P427 A hybrid approach of handling missing data in inflammatory bowel disease (IBD) trials: results from VISIBLE 1 and VARSITY

Journal of Crohn s and Colitis ◽

10.1093/ecco-jcc/jjz203.556 ◽

2020 ◽

Vol 14 (Supplement_1) ◽

pp. S388-S389

Author(s):

J Chen ◽

S Hunter ◽

K Kisfalvi ◽

R A Lirio

Keyword(s):

Sensitivity Analysis ◽

Missing Data ◽

Statistical Power ◽

Hybrid Approach ◽

Missing At Random ◽

P Value ◽

Two Phase ◽

Treatment Difference ◽

Mayo Score ◽

The Impact

Abstract Background Missing data is common in IBD trials. Depending on the volume and nature of missing data, it can reduce statistical power for detecting treatment difference, introduce potential bias and invalidate conclusions. Non-responder imputation (NRI), where patients (patients) with missing data are considered treatment failures, is widely used to handle missing data for dichotomous efficacy endpoints in IBD trials. However, it does not consider the mechanisms leading to missing data and can potentially underestimate the treatment effect. We proposed a hybrid (HI) approach combining NRI and multiple imputation (MI) as an alternative to NRI in the analyses of two phase 3 trials of vedolizumab (VDZ) in patients with moderate-to-severe UC – VISIBLE 11 and VARSITY2. Methods VISIBLE 1 and VARSITY assessed efficacy using dichotomous endpoints based on complete Mayo score. Full methodologies reported previously.1,2 Our proposed HI approach is aimed at imputing missing Mayo scores, instead of imputing the missing dichotomous efficacy endpoint. To assess the impact of dropouts for different missing data mechanisms (categorised as ‘missing not at random [MNAR]’ and ‘missing at random [MAR]’, HI was implemented as a potential sensitivity analysis, where dropouts owing to safety or lack of efficacy were imputed using NRI (assuming MNAR) and other missing data were imputed using MI (assuming MAR). For MI, each component of the Mayo score was imputed via a multivariate stepwise approach using a fully conditional specification ordinal logistic method. Missing baseline scores were imputed using baseline characteristics data. Missing scores from each subsequent visit were imputed using all previous visits in a stepwise fashion. Fifty imputation datasets were computed for each component of Mayo score. The complete Mayo score and relevant efficacy endpoints were derived subsequently. The analysis was performed within each imputed dataset to determine treatment difference, 95% CI and p-value, which were then combined via Rubin’s rules3. Results Tables 1 and 2 show a comparison of efficacy in the two studies using the primary NRI analysis vs. the alternative HI approach for handling missing data. Conclusion HI and NRI approaches can provide consistent efficacy analyses in IBD trials. The HI approach can serve as a useful sensitivity analysis to assess the impact of dropouts under different missing data mechanisms and evaluate the robustness of efficacy conclusions. Reference

Download Full-text