Simple study designs in ecology produce inaccurate estimates of biodiversity responses
AbstractEcologists use a wide range of study designs to estimate the impact of interventions or threats but there are no quantitative comparisons of their accuracy. For example, while it is accepted that simpler designs, such as After (sampling sites post-impact without a control), Before-After (BA) and Control-Impact (CI), are less robust than Randomised Controlled Trials (RCT) and Before-After Control-Impact (BACI) designs, it is not known how much less accurate they are.We simulate a step-change response of a population to an environmental impact using empirically-derived estimates of the major parameters. We use five ecological study designs to estimate the effect of this impact and evaluate each one by determining the percentage of simulations in which they accurately estimate the direction and magnitude of the environmental impact. We also simulate different numbers of replicates and assess several accuracy thresholds.We demonstrate that BACI designs could be 1.1-1.5 times more accurate than RCTs, 2.9-4.1 times more accurate than BA, 3.8-5.6 times more accurate than CI, and 6.8-10.8 times more accurate than After designs, when estimating to within ±30% of the true effect (depending on the sample size). We also found that increasing sample size substantially increases the accuracy of BACI designs but only increases the precision of simpler designs around a biased estimate; only by using more robust designs can accuracy increase. Modestly increasing replication of both control and impact sites also increased the accuracy of BACI designs more than substantially increasing replicates in just one of these groups.We argue that investment into using more robust designs in ecology, where possible, is extremely worthwhile given the inaccuracy of simpler designs, even when using large sample sizes. Based on our results we propose a weighting system that quantitatively ranks the accuracy of studies based on their study design and the number of replicates used. We hope these ‘accuracy weights’ enable researchers to better account for study design in evidence synthesis when assessing the reliability of a range of studies using a variety of designs.