scholarly journals Doubly robust matching estimators for high dimensional confounding adjustment

Biometrics ◽  
2018 ◽  
Vol 74 (4) ◽  
pp. 1171-1179 ◽  
Author(s):  
Joseph Antonelli ◽  
Matthew Cefalu ◽  
Nathan Palmer ◽  
Denis Agniel
Author(s):  
Xiaochun Li ◽  
Changyu Shen

Propensity score–based methods or multiple regressions of the outcome are often used for confounding adjustment in analysis of observational studies. In either approach, a model is needed: A model describing the relationship between the treatment assignment and covariates in the propensity score–based method or a model for the outcome and covariates in the multiple regressions. The 2 models are usually unknown to the investigators and must be estimated. The correct model specification, therefore, is essential for the validity of the final causal estimate. We describe in this article a doubly robust estimator which combines both models propitiously to offer analysts 2 chances for obtaining a valid causal estimate and demonstrate its use through a data set from the Lindner Center Study.


Epidemiology ◽  
2017 ◽  
Vol 28 (2) ◽  
pp. 237-248 ◽  
Author(s):  
Sebastian Schneeweiss ◽  
Wesley Eddings ◽  
Robert J. Glynn ◽  
Elisabetta Patorno ◽  
Jeremy Rassen ◽  
...  

2019 ◽  
Vol 14 (3) ◽  
pp. 805-828 ◽  
Author(s):  
Joseph Antonelli ◽  
Giovanni Parmigiani ◽  
Francesca Dominici

Biometrics ◽  
2020 ◽  
Vol 76 (4) ◽  
pp. 1190-1200 ◽  
Author(s):  
Oliver Dukes ◽  
Vahe Avagyan ◽  
Stijn Vansteelandt

2021 ◽  
Vol 50 (Supplement_1) ◽  
Author(s):  
Jonathan Huang ◽  
Xiang Meng

Abstract Background Flexible, data-adaptive algorithms (machine learning; ML) for nuisance parameter estimation in epidemiologic causal inference have promising asymptotic properties for complex, high-dimensional data. However, recently proposed applications (e.g. targeted maximum likelihood estimation; TMLE) may produce biases parameter and standard error estimates in common real-world cohort settings. The relative performance of these novel estimators over simpler approaches in such settings is unclear. Methods We apply double-crossfit TMLE, augmented inverse probability weighting (AIPW), and standard IPW to simple simulations (5 covariates) and “real-world” data using covariate-structure-preserving (“plasmode”) simulations of 1,178 subjects and 331 covariates from a longitudinal birth cohort. We evaluate various data generating and estimation scenarios including: under- and over- (e.g. excess orthogonal covariates) identification, poor data support, near-instruments, and mis-specified biological interactions. We also track representative computation times. Results We replicate optimal performance of cross-fit, doubly robust estimators in simple data generating processes. However, in nearly every real world-based scenario, estimators fit with parametric learners outperform those that include non-parametric learners in terms of mean bias and confidence interval coverage. Even when correctly specified, estimators fit with non-parametric algorithms (xgboost, random forest) performed poorly (e.g. 24% bias, 57% coverage vs. 10% bias, 79% coverage for parametric fit), at times underperforming simple IPW. Conclusions In typical epidemiologic data sets, double-crossfit estimators fit with simple smooth, parametric learners may be the optimal solution, taking 2-5 times less computation time than flexible non-parametric models, while having equal or better performance. No approaches are optimal, and estimators should be compared on simulations close to the source data. Key messages In epidemiologic studies, use of flexible non-parametric algorithms for effect estimation should be strongly justified (i.e. high-dimensional covariates) and performed with care. Parametric learners may be a safer option with few drawbacks.


Sign in / Sign up

Export Citation Format

Share Document