Methods Matter: p-Hacking and Publication Bias in Causal Analysis in Economics

2020 ◽  
Vol 110 (11) ◽  
pp. 3634-3660 ◽  
Author(s):  
Abel Brodeur ◽  
Nikolai Cook ◽  
Anthony Heyes

The credibility revolution in economics has promoted causal identification using randomized control trials (RCT), difference-in-differences (DID), instrumental variables (IV) and regression discontinuity design (RDD). Applying multiple approaches to over 21,000 hypothesis tests published in 25 leading economics journals, we find that the extent of p-hacking and publication bias varies greatly by method. IV (and to a lesser extent DID) are particularly problematic. We find no evidence that (i) papers published in the Top 5 journals are different to others; (ii) the journal “revise and resubmit” process mitigates the problem; (iii) things are improving through time. (JEL A14, C12, C52)

2020 ◽  
Vol 20 (3) ◽  
pp. 356-389
Author(s):  
Patricia A. Kirkland ◽  
Justin H. Phillips

The regression discontinuity design (RDD) is a valuable tool for identifying causal effects with observational data. However, applying the traditional electoral RDD to the study of divided government is challenging. Because assignment to treatment in this case is the result of elections to multiple institutions, there is no obvious single forcing variable. Here, we use simulations in which we apply shocks to real-world election results in order to generate two measures of the likelihood of divided government, both of which can be used for causal analysis. The first captures the electoral distance to divided government and can easily be utilized in conjunction with the standard sharp RDD toolkit. The second is a simulated probability of divided government. This measure does not easily fit into a sharp RDD framework, so we develop a probability restricted design (PRD) which relies upon the underlying logic of an RDD. This design incorporates common regression techniques but limits the sample to those observations for which assignment to treatment approaches “as-if random.” To illustrate both of our approaches, we reevaluate the link between divided government and the size of budget deficits.


2020 ◽  
Author(s):  
Hiroaki Matsuura ◽  
Masao Fukui ◽  
Kohei Kawaguchi

Abstract In the middle of the global COVID-19 pandemic, the BCG hypothesis, the prevalence and severity of the COVID-19 outbreak seems to be negatively correlated with whether a country has a universal coverage of pediatric Bacillus Calmette–Guérin (BCG) vaccination, has emerged and attracted the attention of scientific community and media outlets. However, all existing claims are based on cross-country correlations that do not exclude the possibility of spurious correlation. By merging country-age-level confirmed case statistics of COVID-19 from 17 countries with the start/termination years of pediatric universal BCG vaccination policy and age-specific BCG vaccination coverage, this paper examines the role of BCG vaccination in COVID-19 infection. Despite the cross-country evidence from the previous literature, the results of both regression discontinuity design and difference-in-differences approaches do not support the BCG hypothesis.The results of these previous studies are likely to suffer from spurious correlations.


2021 ◽  
pp. 004839312110497
Author(s):  
Tung-Ying Wu

The interventionist theory of causation has been advertised as an empirically informed and more nuanced approach to causality than the competing theories. However, previous literature has not yet analyzed the regression discontinuity (hereafter, RD) and the difference-in-differences (hereafter, DD) within an interventionist framework. In this paper, I point out several drawbacks of using the interventionist methodology for justifying the DD and RD designs. Nevertheless, I argue that the first step toward enhancing our understanding of the DD and RD designs from an interventionist perspective is to take advantage of the assumptions of common trend and continuity.


2018 ◽  
Vol 42 (2) ◽  
pp. 214-247 ◽  
Author(s):  
Peter M. Steiner ◽  
Vivian C. Wong

In within-study comparison (WSC) designs, treatment effects from a nonexperimental design, such as an observational study or a regression-discontinuity design, are compared to results obtained from a well-designed randomized control trial with the same target population. The goal of the WSC is to assess whether nonexperimental and experimental designs yield the same results in field settings. A common analytic challenge with WSCs, however, is the choice of appropriate criteria for determining whether nonexperimental and experimental results replicate. This article examines different distance-based correspondence measures for assessing correspondence in experimental and nonexperimental estimates. Distance-based measures investigate whether the difference in estimates is small enough to claim equivalence of methods. We use a simulation study to examine the statistical properties of common correspondence measures and recommend a new and straightforward approach that combines traditional significance testing and equivalence testing in the same framework. The article concludes with practical advice on assessing and interpreting results in WSC contexts.


2020 ◽  
Vol 114 (3) ◽  
pp. 677-690
Author(s):  
JULIA A. PAYSON

Why do local governments sometimes hire lobbyists to represent them in other levels of government? I argue that such mobilization efforts depend in part on the policy congruence between localities and their elected delegates in the legislature. I provide evidence consistent with this theory by examining how municipal governments in the United States respond to partisan and ideological mismatches with their state legislators—a common representational challenge. Using almost a decade of original panel data on municipal lobbying in all 50 states, I employ difference-in-differences and a regression discontinuity design to demonstrate that cities are significantly more likely to hire lobbyists when their districts elect non-co-partisan state representatives. The results are broadly consistent with a model of intergovernmental mobilization in which local officials purchase advocacy to compensate for the preference gaps that sometimes emerge in multilevel government.


2017 ◽  
Vol 9 (2) ◽  
pp. 124-154 ◽  
Author(s):  
Laura Dague ◽  
Thomas DeLeire ◽  
Lindsey Leininger

This study provides plausibly causal estimates of the effect of public insurance coverage on the employment of non-elderly, nondisabled adults without dependent children (“childless adults”). We take advantage of the sudden imposition of an enrollment cap in Wisconsin, comparing the labor supply of enrollees to eligible applicants placed on a waitlist using a regression discontinuity design and difference-in-differences methods. We find enrollment into public insurance leads to sizable and statistically meaningful reductions in employment, with an estimated effect size of just over 5 percentage points, a 12 percent decline. Confidence intervals rule out positive and large negative effects. (JEL G22, H75, I13, I18, I38, J22)


2011 ◽  
Vol 49 (3) ◽  
pp. 722-724

Rema Hanna of John F. Kennedy School of Government, Harvard University reviews “Impact Evaluation in Practice” by Paul J. Gertler, Sebastian Martinez, Patrick Premand, Laura B. Rawlings, and Christel M. J. Vermeersch. The EconLit Abstract of the reviewed work begins “Presents an introduction to the topic of impact evaluation and its practice in development. Discusses why impact evaluation is important; determining evaluation questions; causal inference and counterfactuals; randomized selection methods; regression discontinuity design; difference-in-differences; matching; combining methods; evaluating multifaceted programs; operationalizing the impact evaluation design; choosing the sample; collecting data; and producing and disseminating findings. Glossary; index.”


2010 ◽  
Vol 24 (2) ◽  
pp. 59-68 ◽  
Author(s):  
Christopher A Sims

The fact is, economics is not an experimental science and cannot be. “Natural” experiments and “quasi” experiments are not in fact experiments. They are rhetorical devices that are often invoked to avoid having to confront real econometric difficulties. Natural, quasi-, and computational experiments, as well as regression discontinuity design, can all, when well applied, be useful, but none are panaceas. The essay by Angrist and Pischke, in its enthusiasm for some real accomplishments in certain subfields of economics, makes overbroad claims for its favored methodologies. What the essay says about macroeconomics is mainly nonsense. Consequently, I devote the central part of my comment to describing the main developments that have helped take some of the con out of macroeconomics. Recent enthusiasm for single-equation, linear, instrumental variables approaches in applied microeconomics has led many in these fields to avoid undertaking research that would require them to think formally and carefully about the central issues of nonexperimental inference—what I see and many see as the core of econometrics. Providing empirically grounded policy advice necessarily involves confronting these difficult central issues.


2018 ◽  
Vol 42 (1) ◽  
pp. 71-110 ◽  
Author(s):  
Yang Tang ◽  
Thomas D. Cook

The basic regression discontinuity design (RDD) has less statistical power than a randomized control trial (RCT) with the same sample size. Adding a no-treatment comparison function to the basic RDD creates a comparative RDD (CRD); and when this function comes from the pretest value of the study outcome, a CRD-Pre design results. We use a within-study comparison (WSC) to examine the power of CRD-Pre relative to both basic RDD and RCT. We first build the theoretical foundation for power in CRD-Pre, then derive the relevant variance formulae, and finally compare them to the theoretical RCT variance. We conclude from this theoretical part of this article that (1) CRD-Pre’s power gain depends on the partial correlation between the pretest and posttest measures after conditioning on the assignment variable, (2) CRD-Pre is less responsive than basic RDD to how the assignment variable is distributed and where the cutoff is located, and (3) under a variety of conditions, the efficiency of CRD-Pre is very close to that of the RCT. Data from the National Head Start Impact Study are then used to construct RCT, RDD, and CRD-Pre designs and to compare their power. The empirical results indicate (1) a high level of correspondence between the predicted and obtained power results for RDD and CRD-Pre relative to the RCT, and (2) power levels in CRD-Pre and RCT that are very close. The study is unique among WSCs for its focus on the correspondence between RCT and observational study standard errors rather than means.


Sign in / Sign up

Export Citation Format

Share Document