Methods Matter: p-Hacking and Publication Bias in Causal Analysis in Economics

Abel Brodeur; Nikolai Cook; Anthony Heyes

doi:10.1257/aer.20190687

Methods Matter: p-Hacking and Publication Bias in Causal Analysis in Economics

The American Economic Review ◽

10.1257/aer.20190687 ◽

2020 ◽

Vol 110 (11) ◽

pp. 3634-3660 ◽

Cited By ~ 5

Author(s):

Abel Brodeur ◽

Nikolai Cook ◽

Anthony Heyes

Keyword(s):

Publication Bias ◽

Instrumental Variables ◽

Regression Discontinuity ◽

Causal Analysis ◽

Randomized Control Trials ◽

Regression Discontinuity Design ◽

Hypothesis Tests ◽

Difference In Differences ◽

Randomized Control ◽

Economics Journals

The credibility revolution in economics has promoted causal identification using randomized control trials (RCT), difference-in-differences (DID), instrumental variables (IV) and regression discontinuity design (RDD). Applying multiple approaches to over 21,000 hypothesis tests published in 25 leading economics journals, we find that the extent of p-hacking and publication bias varies greatly by method. IV (and to a lesser extent DID) are particularly problematic. We find no evidence that (i) papers published in the Top 5 journals are different to others; (ii) the journal “revise and resubmit” process mitigates the problem; (iii) things are improving through time. (JEL A14, C12, C52)

Download Full-text

A Regression Discontinuity Design for Studying Divided Government

State Politics & Policy Quarterly ◽

10.1177/1532440019896981 ◽

2020 ◽

Vol 20 (3) ◽

pp. 356-389

Author(s):

Patricia A. Kirkland ◽

Justin H. Phillips

Keyword(s):

Real World ◽

Regression Discontinuity ◽

Causal Analysis ◽

Causal Effects ◽

Divided Government ◽

Regression Discontinuity Design ◽

Budget Deficits ◽

Regression Techniques ◽

Two Measures ◽

Election Results

The regression discontinuity design (RDD) is a valuable tool for identifying causal effects with observational data. However, applying the traditional electoral RDD to the study of divided government is challenging. Because assignment to treatment in this case is the result of elections to multiple institutions, there is no obvious single forcing variable. Here, we use simulations in which we apply shocks to real-world election results in order to generate two measures of the likelihood of divided government, both of which can be used for causal analysis. The first captures the electoral distance to divided government and can easily be utilized in conjunction with the standard sharp RDD toolkit. The second is a simulated probability of divided government. This measure does not easily fit into a sharp RDD framework, so we develop a probability restricted design (PRD) which relies upon the underlying logic of an RDD. This design incorporates common regression techniques but limits the sample to those observations for which assignment to treatment approaches “as-if random.” To illustrate both of our approaches, we reevaluate the link between divided government and the size of budget deficits.

Download Full-text

Does TB Vaccination Reduce COVID-19 Infection?: No Evidence from a Regression Discontinuity and Difference-in-Differences Analysis

10.21203/rs.3.rs-56123/v1 ◽

2020 ◽

Author(s):

Hiroaki Matsuura ◽

Masao Fukui ◽

Kohei Kawaguchi

Keyword(s):

Scientific Community ◽

Regression Discontinuity ◽

Universal Coverage ◽

Regression Discontinuity Design ◽

Bacillus Calmette Guerin ◽

Vaccination Policy ◽

Difference In Differences ◽

Bcg Vaccination ◽

Cross Country

Abstract In the middle of the global COVID-19 pandemic, the BCG hypothesis, the prevalence and severity of the COVID-19 outbreak seems to be negatively correlated with whether a country has a universal coverage of pediatric Bacillus Calmette–Guérin (BCG) vaccination, has emerged and attracted the attention of scientiﬁc community and media outlets. However, all existing claims are based on cross-country correlations that do not exclude the possibility of spurious correlation. By merging country-age-level confirmed case statistics of COVID-19 from 17 countries with the start/termination years of pediatric universal BCG vaccination policy and age-specific BCG vaccination coverage, this paper examines the role of BCG vaccination in COVID-19 infection. Despite the cross-country evidence from the previous literature, the results of both regression discontinuity design and difference-in-differences approaches do not support the BCG hypothesis.The results of these previous studies are likely to suffer from spurious correlations.

Download Full-text

Interventionism and Over-Time Causal Analysis in Social Sciences

Philosophy of the Social Sciences ◽

10.1177/00483931211049766 ◽

2021 ◽

pp. 004839312110497

Author(s):

Tung-Ying Wu

Keyword(s):

Social Sciences ◽

Regression Discontinuity ◽

Causal Analysis ◽

Previous Literature ◽

Difference In Differences ◽

Common Trend ◽

The Difference ◽

Over Time

The interventionist theory of causation has been advertised as an empirically informed and more nuanced approach to causality than the competing theories. However, previous literature has not yet analyzed the regression discontinuity (hereafter, RD) and the difference-in-differences (hereafter, DD) within an interventionist framework. In this paper, I point out several drawbacks of using the interventionist methodology for justifying the DD and RD designs. Nevertheless, I argue that the first step toward enhancing our understanding of the DD and RD designs from an interventionist perspective is to take advantage of the assumptions of common trend and continuity.

Download Full-text

Assessing Correspondence Between Experimental and Nonexperimental Estimates in Within-Study Comparisons

Evaluation Review ◽

10.1177/0193841x18773807 ◽

2018 ◽

Vol 42 (2) ◽

pp. 214-247 ◽

Cited By ~ 9

Author(s):

Peter M. Steiner ◽

Vivian C. Wong

Keyword(s):

Randomized Control Trial ◽

Regression Discontinuity ◽

Target Population ◽

Significance Testing ◽

Equivalence Testing ◽

Regression Discontinuity Design ◽

Practical Advice ◽

Randomized Control ◽

The Difference ◽

Straightforward Approach

In within-study comparison (WSC) designs, treatment effects from a nonexperimental design, such as an observational study or a regression-discontinuity design, are compared to results obtained from a well-designed randomized control trial with the same target population. The goal of the WSC is to assess whether nonexperimental and experimental designs yield the same results in field settings. A common analytic challenge with WSCs, however, is the choice of appropriate criteria for determining whether nonexperimental and experimental results replicate. This article examines different distance-based correspondence measures for assessing correspondence in experimental and nonexperimental estimates. Distance-based measures investigate whether the difference in estimates is small enough to claim equivalence of methods. We use a simulation study to examine the statistical properties of common correspondence measures and recommend a new and straightforward approach that combines traditional significance testing and equivalence testing in the same framework. The article concludes with practical advice on assessing and interpreting results in WSC contexts.

Download Full-text

The Partisan Logic of City Mobilization: Evidence from State Lobbying Disclosures

American Political Science Review ◽

10.1017/s0003055420000118 ◽

2020 ◽

Vol 114 (3) ◽

pp. 677-690

Author(s):

JULIA A. PAYSON

Keyword(s):

United States ◽

Panel Data ◽

Local Governments ◽

Regression Discontinuity ◽

The United States ◽

State Legislators ◽

Regression Discontinuity Design ◽

Difference In Differences ◽

Municipal Governments ◽

Local Officials

Why do local governments sometimes hire lobbyists to represent them in other levels of government? I argue that such mobilization efforts depend in part on the policy congruence between localities and their elected delegates in the legislature. I provide evidence consistent with this theory by examining how municipal governments in the United States respond to partisan and ideological mismatches with their state legislators—a common representational challenge. Using almost a decade of original panel data on municipal lobbying in all 50 states, I employ difference-in-differences and a regression discontinuity design to demonstrate that cities are significantly more likely to hire lobbyists when their districts elect non-co-partisan state representatives. The results are broadly consistent with a model of intergovernmental mobilization in which local officials purchase advocacy to compensate for the preference gaps that sometimes emerge in multilevel government.

Download Full-text

The Effect of Public Insurance Coverage for Childless Adults on Labor Supply

American Economic Journal Economic Policy ◽

10.1257/pol.20150059 ◽

2017 ◽

Vol 9 (2) ◽

pp. 124-154 ◽

Cited By ~ 22

Author(s):

Laura Dague ◽

Thomas DeLeire ◽

Lindsey Leininger

Keyword(s):

Labor Supply ◽

Regression Discontinuity ◽

Insurance Coverage ◽

Regression Discontinuity Design ◽

Negative Effects ◽

Public Insurance ◽

Difference In Differences ◽

Percentage Points ◽

Percent Decline ◽

Rule Out

This study provides plausibly causal estimates of the effect of public insurance coverage on the employment of non-elderly, nondisabled adults without dependent children (“childless adults”). We take advantage of the sudden imposition of an enrollment cap in Wisconsin, comparing the labor supply of enrollees to eligible applicants placed on a waitlist using a regression discontinuity design and difference-in-differences methods. We find enrollment into public insurance leads to sizable and statistically meaningful reductions in employment, with an estimated effect size of just over 5 percentage points, a 12 percent decline. Confidence intervals rule out positive and large negative effects. (JEL G22, H75, I13, I18, I38, J22)

Download Full-text

Evaluating Criteria for English Learner Reclassification: A Causal-Effects Approach Using a Binding-Score Regression Discontinuity Design with Instrumental Variables

Educational Evaluation and Policy Analysis ◽

10.3102/0162373711407912 ◽

2011 ◽

Vol 33 (3) ◽

pp. 267-292 ◽

Cited By ~ 45

Author(s):

Joseph P. Robinson

Keyword(s):

Instrumental Variables ◽

Regression Discontinuity ◽

Causal Effects ◽

Regression Discontinuity Design ◽

English Learner ◽

Binding Score

Download Full-text

Book Reviews

Journal of Economic Literature ◽

10.1257/jel.49.3.719.r2 ◽

2011 ◽

Vol 49 (3) ◽

pp. 722-724

Keyword(s):

Impact Evaluation ◽

Regression Discontinuity ◽

Harvard University ◽

Regression Discontinuity Design ◽

Selection Methods ◽

Evaluation Design ◽

Difference In Differences ◽

The Impact ◽

Combining Methods

Rema Hanna of John F. Kennedy School of Government, Harvard University reviews “Impact Evaluation in Practice” by Paul J. Gertler, Sebastian Martinez, Patrick Premand, Laura B. Rawlings, and Christel M. J. Vermeersch. The EconLit Abstract of the reviewed work begins “Presents an introduction to the topic of impact evaluation and its practice in development. Discusses why impact evaluation is important; determining evaluation questions; causal inference and counterfactuals; randomized selection methods; regression discontinuity design; difference-in-differences; matching; combining methods; evaluating multifaceted programs; operationalizing the impact evaluation design; choosing the sample; collecting data; and producing and disseminating findings. Glossary; index.”

Download Full-text

But Economics Is Not an Experimental Science

The Journal of Economic Perspectives ◽

10.1257/jep.24.2.59 ◽

2010 ◽

Vol 24 (2) ◽

pp. 59-68 ◽

Cited By ~ 48

Author(s):

Christopher A Sims

Keyword(s):

Instrumental Variables ◽

Regression Discontinuity ◽

Single Equation ◽

Applied Microeconomics ◽

Computational Experiments ◽

Regression Discontinuity Design ◽

Natural Experiments ◽

Experimental Science ◽

Rhetorical Devices ◽

The Core

The fact is, economics is not an experimental science and cannot be. “Natural” experiments and “quasi” experiments are not in fact experiments. They are rhetorical devices that are often invoked to avoid having to confront real econometric difficulties. Natural, quasi-, and computational experiments, as well as regression discontinuity design, can all, when well applied, be useful, but none are panaceas. The essay by Angrist and Pischke, in its enthusiasm for some real accomplishments in certain subfields of economics, makes overbroad claims for its favored methodologies. What the essay says about macroeconomics is mainly nonsense. Consequently, I devote the central part of my comment to describing the main developments that have helped take some of the con out of macroeconomics. Recent enthusiasm for single-equation, linear, instrumental variables approaches in applied microeconomics has led many in these fields to avoid undertaking research that would require them to think formally and carefully about the central issues of nonexperimental inference—what I see and many see as the core of econometrics. Providing empirically grounded policy advice necessarily involves confronting these difficult central issues.

Download Full-text

Statistical Power for the Comparative Regression Discontinuity Design With a Pretest No-Treatment Control Function: Theory and Evidence From the National Head Start Impact Study

Evaluation Review ◽

10.1177/0193841x18776117 ◽

2018 ◽

Vol 42 (1) ◽

pp. 71-110 ◽

Cited By ~ 4

Author(s):

Yang Tang ◽

Thomas D. Cook

Keyword(s):

Head Start ◽

Statistical Power ◽

Regression Discontinuity ◽

Control Function ◽

Comparison Function ◽

Regression Discontinuity Design ◽

Impact Study ◽

Treatment Comparison ◽

Randomized Control ◽

High Level

The basic regression discontinuity design (RDD) has less statistical power than a randomized control trial (RCT) with the same sample size. Adding a no-treatment comparison function to the basic RDD creates a comparative RDD (CRD); and when this function comes from the pretest value of the study outcome, a CRD-Pre design results. We use a within-study comparison (WSC) to examine the power of CRD-Pre relative to both basic RDD and RCT. We first build the theoretical foundation for power in CRD-Pre, then derive the relevant variance formulae, and finally compare them to the theoretical RCT variance. We conclude from this theoretical part of this article that (1) CRD-Pre’s power gain depends on the partial correlation between the pretest and posttest measures after conditioning on the assignment variable, (2) CRD-Pre is less responsive than basic RDD to how the assignment variable is distributed and where the cutoff is located, and (3) under a variety of conditions, the efficiency of CRD-Pre is very close to that of the RCT. Data from the National Head Start Impact Study are then used to construct RCT, RDD, and CRD-Pre designs and to compare their power. The empirical results indicate (1) a high level of correspondence between the predicted and obtained power results for RDD and CRD-Pre relative to the RCT, and (2) power levels in CRD-Pre and RCT that are very close. The study is unique among WSCs for its focus on the correspondence between RCT and observational study standard errors rather than means.

Download Full-text