A Survey on Causal Inference

Liuyi Yao; Zhixuan Chu; Sheng Li; Yaliang Li; Jing Gao; Aidong Zhang

doi:10.1145/3444944

A Survey on Causal Inference

ACM Transactions on Knowledge Discovery from Data ◽

10.1145/3444944 ◽

2021 ◽

Vol 15 (5) ◽

pp. 1-46

Author(s):

Liuyi Yao ◽

Zhixuan Chu ◽

Sheng Li ◽

Yaliang Li ◽

Jing Gao ◽

...

Keyword(s):

Machine Learning ◽

Causal Inference ◽

Observational Data ◽

Causal Effect ◽

Research Direction ◽

Estimation Methods ◽

Potential Outcome ◽

Outcome Framework ◽

Benchmark Datasets ◽

Inference Methods

Causal inference is a critical research topic across many domains, such as statistics, computer science, education, public policy, and economics, for decades. Nowadays, estimating causal effect from observational data has become an appealing research direction owing to the large amount of available data and low budget requirement, compared with randomized controlled trials. Embraced with the rapidly developed machine learning area, various causal effect estimation methods for observational data have sprung up. In this survey, we provide a comprehensive review of causal inference methods under the potential outcome framework, one of the well-known causal inference frameworks. The methods are divided into two categories depending on whether they require all three assumptions of the potential outcome framework or not. For each category, both the traditional statistical methods and the recent machine learning enhanced methods are discussed and compared. The plausible applications of these methods are also presented, including the applications in advertising, recommendation, medicine, and so on. Moreover, the commonly used benchmark datasets as well as the open-source codes are also summarized, which facilitate researchers and practitioners to explore, evaluate and apply the causal inference methods.

Download Full-text

Estimating the effect of plate discipline using a causal inference framework: an application of the G-computation algorithm

Journal of Quantitative Analysis in Sports ◽

10.1515/jqas-2016-0029 ◽

2018 ◽

Vol 14 (2) ◽

pp. 37-56

Author(s):

David Michael Vock ◽

Laura Frances Boehm Vock

Keyword(s):

Causal Inference ◽

Potential Outcome ◽

Batting Average ◽

Outcome Framework ◽

Computation Algorithm ◽

Using Data ◽

Correlated Factors

AbstractOffensive performance in baseball depends on a number of correlated factors: the pitches the batter faces, the batter’s choice to swing, and the batter’s hitting ability. Recently a renewed focus on the effect of plate discipline on batter performance has emerged. Plate discipline has traditionally been summarized as the proportion of pitches inside and outside of the strike zone a player swings at; however, there have been few metrics proposed to assess the effect of plate discipline directly on batters’ outcomes. In this paper, we focus on estimating a batter’s performance if he were able to adopt a different plate discipline. Because we wish to assess the effect of a counterfactual plate discipline, we use a potential outcome framework and show how the G-computation algorithm can be used to isolate the effect of plate discipline separately from a batter’s hitting ability or the types of pitches the batter faces. As an example, we implement our approach using data collected with the PITCHf/x system over the 2012–2014 seasons to identify the improvement Starlin Castro would expect to see in offensive performance were he able to adopt Andrew McCutchen’s plate discipline. We estimate that had Castro adopted McCutchen’s discipline his batting average, on-base percentage, and slugging percentage would have increased 0.017 (se = 0.004), 0.040 (se = 0.006), and 0.028 (se = 0.008), respectively.

Download Full-text

You Can’t Drive a Car With Only Three Wheels

American Journal of Epidemiology ◽

10.1093/aje/kwz119 ◽

2019 ◽

Vol 188 (9) ◽

pp. 1682-1685 ◽

Cited By ~ 1

Author(s):

Hailey R Banack

Keyword(s):

Causal Inference ◽

Observational Data ◽

Causal Effect ◽

Causal Effects ◽

Measurement Bias ◽

Exposure Misclassification ◽

Unmeasured Confounding ◽

Obstetrical Care ◽

The Impact ◽

Fundamental Requirement

Abstract Authors aiming to estimate causal effects from observational data frequently discuss 3 fundamental identifiability assumptions for causal inference: exchangeability, consistency, and positivity. However, too often, studies fail to acknowledge the importance of measurement bias in causal inference. In the presence of measurement bias, the aforementioned identifiability conditions are not sufficient to estimate a causal effect. The most fundamental requirement for estimating a causal effect is knowing who is truly exposed and unexposed. In this issue of the Journal, Caniglia et al. (Am J Epidemiol. 2019;000(00):000–000) present a thorough discussion of methodological challenges when estimating causal effects in the context of research on distance to obstetrical care. Their article highlights empirical strategies for examining nonexchangeability due to unmeasured confounding and selection bias and potential violations of the consistency assumption. In addition to the important considerations outlined by Caniglia et al., authors interested in estimating causal effects from observational data should also consider implementing quantitative strategies to examine the impact of misclassification. The objective of this commentary is to emphasize that you can’t drive a car with only three wheels, and you also cannot estimate a causal effect in the presence of exposure misclassification bias.

Download Full-text

Causal inference for clinicians

BMJ evidence-based medicine ◽

10.1136/bmjebm-2018-111069 ◽

2019 ◽

Vol 24 (3) ◽

pp. 109-112 ◽

Cited By ~ 3

Author(s):

Steven D Stovitz ◽

Ian Shrier

Keyword(s):

Decision Making ◽

Causal Inference ◽

Clinical Decision Making ◽

Causal Effect ◽

Treatment Decision ◽

Clinical Decision ◽

Directed Acyclic Graphs ◽

Causal Effects ◽

Treatment Decision Making ◽

Inference Methods

Evidence-based medicine (EBM) calls on clinicians to incorporate the ‘best available evidence’ into clinical decision-making. For decisions regarding treatment, the best evidence is that which determines the causal effect of treatments on the clinical outcomes of interest. Unfortunately, research often provides evidence where associations are not due to cause-and-effect, but rather due to non-causal reasons. These non-causal associations may provide valid evidence for diagnosis or prognosis, but biased evidence for treatment effects. Causal inference aims to determine when we can infer that associations are or are not due to causal effects. Since recommending treatments that do not have beneficial causal effects will not improve health, causal inference can advance the practice of EBM. The purpose of this article is to familiarise clinicians with some of the concepts and terminology that are being used in the field of causal inference, including graphical diagrams known as ‘causal directed acyclic graphs’. In order to demonstrate some of the links between causal inference methods and clinical treatment decision-making, we use a clinical vignette of assessing treatments to lower cardiovascular risk. As the field of causal inference advances, clinicians familiar with the methods and terminology will be able to improve their adherence to the principles of EBM by distinguishing causal effects of treatment from results due to non-causal associations that may be a source of bias.

Download Full-text

Deliberation and Experimental Design

The Oxford Handbook of Deliberative Democracy ◽

10.1093/oxfordhb/9780198747369.013.13 ◽

2018 ◽

pp. 662-677

Author(s):

Kevin Esterling

Keyword(s):

Experimental Design ◽

Causal Inference ◽

Observational Data ◽

Research Design ◽

Causal Effect ◽

Internal Validity ◽

Experimental Designs ◽

Quasi Experimental ◽

Methodological Considerations

This chapter describes the methodological considerations necessary for making a causal inference regarding the effect of institutions and group contexts on deliberation. This chapter focuses on the elements of the research design of a study and the assumptions that are necessary to state a causal inference given a particular design; these considerations are applicable to randomized experimental designs, both in the lab and in the field, as well as to quasi-experimental or natural experimental designs using observational data. The chapter shows how to assess the internal validity of a study for identifying a causal effect for a given study and briefly discusses external and epistemic validity considerations that are of particular urgency for empirical deliberation.

Download Full-text

A scoping review of causal methods enabling predictions under hypothetical interventions

Diagnostic and Prognostic Research ◽

10.1186/s41512-021-00092-9 ◽

2021 ◽

Vol 5 (1) ◽

Author(s):

Lijing Lin ◽

Matthew Sperrin ◽

David A. Jenkins ◽

Glen P. Martin ◽

Niels Peek

Keyword(s):

Machine Learning ◽

Causal Inference ◽

Observational Data ◽

Prediction Models ◽

Risk Estimation ◽

Clinical Decision ◽

Causal Effects ◽

Statistical Machine Learning ◽

Methodological Approaches ◽

Meta Analyses

Abstract Background The methods with which prediction models are usually developed mean that neither the parameters nor the predictions should be interpreted causally. For many applications, this is perfectly acceptable. However, when prediction models are used to support decision making, there is often a need for predicting outcomes under hypothetical interventions. Aims We aimed to identify published methods for developing and validating prediction models that enable risk estimation of outcomes under hypothetical interventions, utilizing causal inference. We aimed to identify the main methodological approaches, their underlying assumptions, targeted estimands, and potential pitfalls and challenges with using the method. Finally, we aimed to highlight unresolved methodological challenges. Methods We systematically reviewed literature published by December 2019, considering papers in the health domain that used causal considerations to enable prediction models to be used for predictions under hypothetical interventions. We included both methodologies proposed in statistical/machine learning literature and methodologies used in applied studies. Results We identified 4919 papers through database searches and a further 115 papers through manual searches. Of these, 87 papers were retained for full-text screening, of which 13 were selected for inclusion. We found papers from both the statistical and the machine learning literature. Most of the identified methods for causal inference from observational data were based on marginal structural models and g-estimation. Conclusions There exist two broad methodological approaches for allowing prediction under hypothetical intervention into clinical prediction models: (1) enriching prediction models derived from observational studies with estimated causal effects from clinical trials and meta-analyses and (2) estimating prediction models and causal effects directly from observational data. These methods require extending to dynamic treatment regimes, and consideration of multiple interventions to operationalise a clinical decision support system. Techniques for validating ‘causal prediction models’ are still in their infancy.

Download Full-text

The Neyman— Rubin Model of Causal Inference and Estimation Via Matching Methods

10.1093/oxfordhb/9780199286546.003.0011 ◽

2009 ◽

Cited By ~ 11

Author(s):

Jasjeet Sekhon

Keyword(s):

Machine Learning ◽

Causal Inference ◽

Observational Data ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Conditional Probabilities ◽

Causal Inferences ◽

Matching Problem ◽

Matching Methods ◽

Causal Mechanisms

This article presents a detailed discussion of the Neyman-Rubin model of causal inference. Additionally, it describes under what conditions ‘matching’ approaches can lead to valid inferences, and what kinds of compromises sometimes have to be made with respect to generalizability to ensure valid causal inferences. Moreover, the article summarizes Mill's first three canons and shows the importance of taking chance into account and comparing conditional probabilities when chance variations cannot be ignored. The significance of searching for causal mechanisms is often overestimated by political scientists and this sometimes leads to an underestimate of the importance of comparing conditional probabilities. The search for causal mechanisms is probably especially useful when working with observational data. Machine learning algorithms can be used against the matching problem.

Download Full-text

Automating Mendelian randomization through machine learning to construct a putative causal map of the human phenome

10.1101/173682 ◽

2017 ◽

Cited By ~ 32

Author(s):

Gibran Hemani ◽

Jack Bowden ◽

Philip Haycock ◽

Jie Zheng ◽

Oliver Davis ◽

...

Keyword(s):

Machine Learning ◽

Causal Inference ◽

Mendelian Randomization ◽

Causal Effect ◽

Association Studies ◽

Genome Wide Association Studies ◽

False Discovery Rates ◽

Level Data ◽

Invalid Instruments ◽

Major Application

AbstractA major application for genome-wide association studies (GWAS) has been the emerging field of causal inference using Mendelian randomization (MR), where the causal effect between a pair of traits can be estimated using only summary level data. MR depends on SNPs exhibiting vertical pleiotropy, where the SNP influences an outcome phenotype only through an exposure phenotype. Issues arise when this assumption is violated due to SNPs exhibiting horizontal pleiotropy. We demonstrate that across a range of pleiotropy models, instrument selection will be increasingly liable to selecting invalid instruments as GWAS sample sizes continue to grow. Methods have been developed in an attempt to protect MR from different patterns of horizontal pleiotropy, and here we have designed a mixture-of-experts machine learning framework (MR-MoE 1.0) that predicts the most appropriate model to use for any specific causal analysis, improving on both power and false discovery rates. Using the approach, we systematically estimated the causal effects amongst 2407 phenotypes. Almost 90% of causal estimates indicated some level of horizontal pleiotropy. The causal estimates are organised into a publicly available graph database (http://eve.mrbase.org), and we use it here to highlight the numerous challenges that remain in automated causal inference.

Download Full-text

Squeezing observational data for better causal inference: Methods and examples for prevention research

International Journal of Psychology ◽

10.1002/ijop.12275 ◽

2016 ◽

Vol 52 (2) ◽

pp. 96-105 ◽

Cited By ~ 5

Author(s):

Diego Garcia-Huidobro ◽

J. Michael Oakes

Keyword(s):

Causal Inference ◽

Observational Data ◽

Prevention Research ◽

Inference Methods

Download Full-text

Reflection on modern methods: when worlds collide—prediction, machine learning and causal inference

International Journal of Epidemiology ◽

10.1093/ije/dyz132 ◽

2019 ◽

Cited By ~ 10

Author(s):

Tony Blakely ◽

John Lynch ◽

Koen Simons ◽

Rebecca Bentley ◽

Sherri Rose

Keyword(s):

Machine Learning ◽

Causal Inference ◽

Propensity Scores ◽

Likelihood Estimation ◽

Targeted Maximum Likelihood Estimation ◽

Recent Emergence ◽

Targeted Maximum Likelihood ◽

Inference Methods ◽

Processing Steps

AbstractCausal inference requires theory and prior knowledge to structure analyses, and is not usually thought of as an arena for the application of prediction modelling. However, contemporary causal inference methods, premised on counterfactual or potential outcomes approaches, often include processing steps before the final estimation step. The purposes of this paper are: (i) to overview the recent emergence of prediction underpinning steps in contemporary causal inference methods as a useful perspective on contemporary causal inference methods, and (ii) explore the role of machine learning (as one approach to ‘best prediction’) in causal inference. Causal inference methods covered include propensity scores, inverse probability of treatment weights (IPTWs), G computation and targeted maximum likelihood estimation (TMLE). Machine learning has been used more for propensity scores and TMLE, and there is potential for increased use in G computation and estimation of IPTWs.

Download Full-text

Causation and causal inference

10.1093/med/9780198816805.003.0037 ◽

2021 ◽

pp. 183-192

Author(s):

Katherine J. Hoggatt ◽

Tyler J. VanderWeele ◽

Sander Greenland

Keyword(s):

Public Health ◽

Causal Inference ◽

Health Research ◽

Public Health Research ◽

Prediction Problem ◽

Potential Outcome ◽

Epidemiologic Evidence ◽

Outcome Framework ◽

Research Findings ◽

Inference Theory

This chapter provides an introduction to causal inference theory for public health research. Causal inference can be viewed as a prediction problem, addressing the question of what the likely outcome will be under one action vs. an alternative action. To answer this question usefully requires clarity and precision in both the statement of the causal hypothesis and the techniques used to attempt an answer. This chapter reviews considerations that have been invoked in discussions of causality based on epidemiologic evidence. It then describes the potential-outcome (counterfactual) framework for cause and effect, which shows how measures of effect and association can be distinguished. The potential-outcome framework illustrates problems inherent in attempts to quantify the changes in health expected under different actions or interventions. The chapter concludes with a discussion of how research findings may be translated into policy.

Download Full-text