Variable Selection for Causal Effect Estimation: Nonparametric Conditional Independence Testing With Random Forests

Bryan Keller

doi:10.3102/1076998619872001

Variable Selection for Causal Effect Estimation: Nonparametric Conditional Independence Testing With Random Forests

Journal of Educational and Behavioral Statistics ◽

10.3102/1076998619872001 ◽

2019 ◽

Vol 45 (2) ◽

pp. 119-142

Author(s):

Bryan Keller

Keyword(s):

Variable Selection ◽

Causal Effect ◽

Practical Reasons ◽

Simulation Studies ◽

Permutation Testing ◽

Widespread Availability ◽

Selection For ◽

Independence Testing ◽

Effect Estimation

Widespread availability of rich educational databases facilitates the use of conditioning strategies to estimate causal effects with nonexperimental data. With dozens, hundreds, or more potential predictors, variable selection can be useful for practical reasons related to communicating results and for statistical reasons related to improving the efficiency of estimators. Background knowledge should take precedence in deciding which variables to retain. However, with many potential predictors, theory may be weak, such that functional form relationships are likely to be unknown. In this article, I propose a nonparametric method for data-driven variable selection based on permutation testing with conditional random forest variable importance. The algorithm automatically handles nonlinear relationships and interactions in its naive implementation. Through a series of Monte Carlo simulation studies and a case study with Early Childhood Longitudinal Study–K data, I find that the method performs well across a variety of scenarios where other methods fail.

Download Full-text

Variable selection for skew-normal mixture of joint location and scale models

Applied Mathematics-A Journal of Chinese Universities ◽

10.1007/s11766-021-3774-x ◽

2021 ◽

Vol 36 (4) ◽

pp. 475-491

Author(s):

Liu-cang Wu ◽

Song-qin Yang ◽

Ye Tao

Keyword(s):

Variable Selection ◽

Model Parameters ◽

Simulation Studies ◽

Finite Sample ◽

Normal Mixture ◽

Explanatory Variables ◽

Scale Models ◽

Joint Location ◽

Selection For ◽

Skew Normal

AbstractAlthough there are many papers on variable selection methods based on mean model in the finite mixture of regression models, little work has been done on how to select significant explanatory variables in the modeling of the variance parameter. In this paper, we propose and study a novel class of models: a skew-normal mixture of joint location and scale models to analyze the heteroscedastic skew-normal data coming from a heterogeneous population. The problem of variable selection for the proposed models is considered. In particular, a modified Expectation-Maximization(EM) algorithm for estimating the model parameters is developed. The consistency and the oracle property of the penalized estimators is established. Simulation studies are conducted to investigate the finite sample performance of the proposed methodologies. An example is illustrated by the proposed methodologies.

Download Full-text

Stochastic variable selection strategies for zero-inflated models

Statistical Modelling ◽

10.1177/1471082x17711068 ◽

2017 ◽

Vol 18 (1) ◽

pp. 3-23 ◽

Cited By ~ 2

Author(s):

Eva Cantoni ◽

Marie Auda

Keyword(s):

Variable Selection ◽

Count Data ◽

Negative Binomial ◽

Stochastic Search ◽

Simulation Studies ◽

Stochastic Variable ◽

Selection Strategies ◽

Selection For ◽

A Chain ◽

Zero Counts

When count data exhibit excess zero, that is more zero counts than a simpler parametric distribution can model, the zero-inflated Poisson (ZIP) or zero-inflated negative binomial (ZINB) models are often used. Variable selection for these models is even more challenging than for other regression situations because the availability of p covariates implies 4 p possible models. We adapt to zero-inflated models an approach for variable selection that avoids the screening of all possible models. This approach is based on a stochastic search through the space of all possible models, which generates a chain of interesting models. As an additional novelty, we propose three ways of extracting information from this rich chain and we compare them in two simulation studies, where we also contrast our approach with regularization (penalized) techniques available in the literature. The analysis of a typical dataset that has motivated our research is also presented, before concluding with some recommendations.

Download Full-text

Ultrahigh Dimensional Variable Selection for Interpolation of Point Referenced Spatial Data: A Digital Soil Mapping Case Study

PLoS ONE ◽

10.1371/journal.pone.0162489 ◽

2016 ◽

Vol 11 (9) ◽

pp. e0162489 ◽

Cited By ~ 7

Author(s):

Benjamin R. Fitzpatrick ◽

David W. Lamb ◽

Kerrie Mengersen

Keyword(s):

Variable Selection ◽

Spatial Data ◽

Soil Mapping ◽

Digital Soil Mapping ◽

Selection For ◽

Dimensional Variable

Download Full-text

Variable selection for hedonic model using machine learning approaches: A case study in Onondaga County, NY

Landscape and Urban Planning ◽

10.1016/j.landurbplan.2012.06.009 ◽

2012 ◽

Vol 107 (3) ◽

pp. 293-306 ◽

Cited By ~ 36

Author(s):

Sanglim Yoo ◽

Jungho Im. ◽

John E. Wagner

Keyword(s):

Machine Learning ◽

Variable Selection ◽

Hedonic Model ◽

Learning Approaches ◽

Selection For ◽

Onondaga County

Download Full-text

simcausal R Package: Conducting Transparent and Reproducible Simulation Studies of Causal Effect Estimation with Complex Longitudinal Data

Journal of Statistical Software ◽

10.18637/jss.v081.i02 ◽

2017 ◽

Vol 81 (2) ◽

Cited By ~ 5

Author(s):

Oleg Sofrygin ◽

Mark J. van der Laan ◽

Romain Neugebauer

Keyword(s):

Longitudinal Data ◽

Causal Effect ◽

R Package ◽

Simulation Studies ◽

Effect Estimation

Download Full-text

Variable Selection for Referenceless Multivariate Calibration: A Case Study on Nicotine Determination in Flue-Cured Tobacco Powder by Near-Infrared (NIR) Spectroscopy

Analytical Letters ◽

10.1080/00032719.2021.1974028 ◽

2021 ◽

pp. 1-16

Author(s):

Yiming Bi ◽

Xianwei Hao ◽

Lu Dai ◽

Yuhan Peng ◽

Jinxin Tie ◽

...

Keyword(s):

Variable Selection ◽

Near Infrared ◽

Multivariate Calibration ◽

Nir Spectroscopy ◽

Selection For

Download Full-text

Strong Consistency of Variable Selection for Stationary Linear Stochastic Systems

2019 Chinese Control Conference (CCC) ◽

10.23919/chicc.2019.8866312 ◽

2019 ◽

Author(s):

ZHAO Wenxiao ◽

YIN G. George ◽

BAI Er-Wei

Keyword(s):

Variable Selection ◽

Strong Consistency ◽

Stochastic Systems ◽

Linear Stochastic Systems ◽

Selection For

Download Full-text

A CASE STUDY OF SUPPLIER SELECTION FOR A STEELMAKING COMPANY IN LIBYA BY USING THE COMBINATIVE DISTANCE-BASED ASSESSMENT (CODAS) MODEL

Decision Making Applications in Management and Engineering ◽

10.31181/dmame180101b ◽

2018 ◽

Vol 1 (1) ◽

pp. 01-12 ◽

Cited By ~ 15

Author(s):

Ibrahim Ahmed Badi ◽

◽

Ali M Abdulshahed ◽

Ali G. Shetwan ◽

◽

...

Keyword(s):

Supplier Selection ◽

Selection For

Download Full-text

Variable selection for semiparametric random-effects conditional density models with longitudinal data

Communication in Statistics- Theory and Methods ◽

10.1080/03610926.2018.1554130 ◽

2018 ◽

Vol 49 (4) ◽

pp. 977-996

Author(s):

Xiaohui Yuan ◽

Yue Wang ◽

Tianqing Liu

Keyword(s):

Variable Selection ◽

Longitudinal Data ◽

Random Effects ◽

Conditional Density ◽

Selection For

Download Full-text

A Case Study on Experiment Site Selection for PV Energy Generation Forecast

2020 International Computer Symposium (ICS) ◽

10.1109/ics51289.2020.00098 ◽

2020 ◽

Author(s):

Huang Yu Hsiang ◽

Tseng Sheng Yuan ◽

Ping Wang ◽

Lin Wen Hui ◽

Lin Hsiao Chung

Keyword(s):

Site Selection ◽

Energy Generation ◽

Selection For

Download Full-text