Doubly robust matching estimators for high dimensional confounding adjustment

Propensity score–based methods or multiple regressions of the outcome are often used for confounding adjustment in analysis of observational studies. In either approach, a model is needed: A model describing the relationship between the treatment assignment and covariates in the propensity score–based method or a model for the outcome and covariates in the multiple regressions. The 2 models are usually unknown to the investigators and must be estimated. The correct model specification, therefore, is essential for the validity of the final causal estimate. We describe in this article a doubly robust estimator which combines both models propitiously to offer analysts 2 chances for obtaining a valid causal estimate and demonstrate its use through a data set from the Lindner Center Study.

Download Full-text

Confounding adjustment via a semi-automated high-dimensional propensity score algorithm: an application to electronic medical records

Pharmacoepidemiology and Drug Safety ◽

10.1002/pds.2152 ◽

2011 ◽

Vol 20 (8) ◽

pp. 849-857 ◽

Cited By ~ 62

Author(s):

Sengwee Toh ◽

Luis A. García Rodríguez ◽

Miguel A. Hernán

Keyword(s):

Propensity Score ◽

Electronic Medical Records ◽

Medical Records ◽

High Dimensional ◽

Confounding Adjustment

Download Full-text

Doubly robust inference when combining probability and non-probability samples with high dimensional data

Journal of the Royal Statistical Society Series B (Statistical Methodology) ◽

10.1111/rssb.12354 ◽

2020 ◽

Vol 82 (2) ◽

pp. 445-465 ◽

Cited By ~ 2

Author(s):

Shu Yang ◽

Jae Kwang Kim ◽

Rui Song

Keyword(s):

High Dimensional Data ◽

Robust Inference ◽

High Dimensional ◽

Doubly Robust

Download Full-text

Variable Selection for Confounding Adjustment in High-dimensional Covariate Spaces When Analyzing Healthcare Databases

Epidemiology ◽

10.1097/ede.0000000000000581 ◽

2017 ◽

Vol 28 (2) ◽

pp. 237-248 ◽

Cited By ~ 28

Author(s):

Sebastian Schneeweiss ◽

Wesley Eddings ◽

Robert J. Glynn ◽

Elisabetta Patorno ◽

Jeremy Rassen ◽

...

Keyword(s):

Variable Selection ◽

High Dimensional ◽

Selection For ◽

Confounding Adjustment

Download Full-text

High-Dimensional Confounding Adjustment Using Continuous Spike and Slab Priors

Bayesian Analysis ◽

10.1214/18-ba1131 ◽

2019 ◽

Vol 14 (3) ◽

pp. 805-828 ◽

Cited By ~ 3

Author(s):

Joseph Antonelli ◽

Giovanni Parmigiani ◽

Francesca Dominici

Keyword(s):

High Dimensional ◽

Confounding Adjustment

Download Full-text

Faculty Opinions recommendation of Regularized Regression Versus the High-Dimensional Propensity Score for Confounding Adjustment in Secondary Database Analyses.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.725693737.793511931 ◽

2015 ◽

Author(s):

Robert Platt

Keyword(s):

Propensity Score ◽

High Dimensional ◽

Regularized Regression ◽

Secondary Database ◽

Confounding Adjustment

Download Full-text

Doubly robust tests of exposure effects under high‐dimensional confounding

Biometrics ◽

10.1111/biom.13231 ◽

2020 ◽

Vol 76 (4) ◽

pp. 1190-1200 ◽

Cited By ~ 2

Author(s):

Oliver Dukes ◽

Vahe Avagyan ◽

Stijn Vansteelandt

Keyword(s):

High Dimensional ◽

Robust Tests ◽

Doubly Robust

Download Full-text

Regularized Regression Versus the High-Dimensional Propensity Score for Confounding Adjustment in Secondary Database Analyses

American Journal of Epidemiology ◽

10.1093/aje/kwv108 ◽

2015 ◽

Vol 182 (7) ◽

pp. 651-659 ◽

Cited By ~ 24

Author(s):

Jessica M. Franklin ◽

Wesley Eddings ◽

Robert J. Glynn ◽

Sebastian Schneeweiss

Keyword(s):

Propensity Score ◽

High Dimensional ◽

Regularized Regression ◽

Secondary Database ◽

Confounding Adjustment

Download Full-text

521Performance of doubly-robust, machine learning effect estimators in realistic epidemiologic data settings and practical recommendations

International Journal of Epidemiology ◽

10.1093/ije/dyab168.293 ◽

2021 ◽

Vol 50 (Supplement_1) ◽

Author(s):

Jonathan Huang ◽

Xiang Meng

Keyword(s):

Machine Learning ◽

Real World ◽

Nuisance Parameter ◽

Parametric Models ◽

High Dimensional ◽

Real World Data ◽

Epidemiologic Data ◽

Doubly Robust ◽

Parametric Algorithms ◽

Non Parametric

Abstract Background Flexible, data-adaptive algorithms (machine learning; ML) for nuisance parameter estimation in epidemiologic causal inference have promising asymptotic properties for complex, high-dimensional data. However, recently proposed applications (e.g. targeted maximum likelihood estimation; TMLE) may produce biases parameter and standard error estimates in common real-world cohort settings. The relative performance of these novel estimators over simpler approaches in such settings is unclear. Methods We apply double-crossfit TMLE, augmented inverse probability weighting (AIPW), and standard IPW to simple simulations (5 covariates) and “real-world” data using covariate-structure-preserving (“plasmode”) simulations of 1,178 subjects and 331 covariates from a longitudinal birth cohort. We evaluate various data generating and estimation scenarios including: under- and over- (e.g. excess orthogonal covariates) identification, poor data support, near-instruments, and mis-specified biological interactions. We also track representative computation times. Results We replicate optimal performance of cross-fit, doubly robust estimators in simple data generating processes. However, in nearly every real world-based scenario, estimators fit with parametric learners outperform those that include non-parametric learners in terms of mean bias and confidence interval coverage. Even when correctly specified, estimators fit with non-parametric algorithms (xgboost, random forest) performed poorly (e.g. 24% bias, 57% coverage vs. 10% bias, 79% coverage for parametric fit), at times underperforming simple IPW. Conclusions In typical epidemiologic data sets, double-crossfit estimators fit with simple smooth, parametric learners may be the optimal solution, taking 2-5 times less computation time than flexible non-parametric models, while having equal or better performance. No approaches are optimal, and estimators should be compared on simulations close to the source data. Key messages In epidemiologic studies, use of flexible non-parametric algorithms for effect estimation should be strongly justified (i.e. high-dimensional covariates) and performed with care. Parametric learners may be a safer option with few drawbacks.

Download Full-text

Doubly robust and efficient estimators for heteroscedastic partially linear single-index models allowing high dimensional covariates

Journal of the Royal Statistical Society Series B (Statistical Methodology) ◽

10.1111/j.1467-9868.2012.01040.x ◽

2012 ◽

Vol 75 (2) ◽

pp. 305-322 ◽

Cited By ~ 21

Author(s):

Yanyuan Ma ◽

Liping Zhu

Keyword(s):

High Dimensional ◽

Single Index ◽

Partially Linear ◽

Single Index Models ◽

Doubly Robust

Download Full-text