scholarly journals Causality in Statistical Power: Isomorphic Properties of Measurement, Research Design, Effect Size, and Sample Size

Scientifica ◽  
2016 ◽  
Vol 2016 ◽  
pp. 1-5 ◽  
Author(s):  
R. Eric Heidel

Statistical power is the ability to detect a significant effect, given that the effect actually exists in a population. Like most statistical concepts, statistical power tends to induce cognitive dissonance in hepatology researchers. However, planning for statistical power by ana priorisample size calculation is of paramount importance when designing a research study. There are five specific empirical components that make up ana priorisample size calculation: the scale of measurement of the outcome, the research design, the magnitude of the effect size, the variance of the effect size, and the sample size. A framework grounded in the phenomenon of isomorphism, or interdependencies amongst different constructs with similar forms, will be presented to understand the isomorphic effects of decisions made on each of the five aforementioned components of statistical power.

2019 ◽  
Author(s):  
Rob Cribbie ◽  
Nataly Beribisky ◽  
Udi Alter

Many bodies recommend that a sample planning procedure, such as traditional NHST a priori power analysis, is conducted during the planning stages of a study. Power analysis allows the researcher to estimate how many participants are required in order to detect a minimally meaningful effect size at a specific level of power and Type I error rate. However, there are several drawbacks to the procedure that render it “a mess.” Specifically, the identification of the minimally meaningful effect size is often difficult but unavoidable for conducting the procedure properly, the procedure is not precision oriented, and does not guide the researcher to collect as many participants as feasibly possible. In this study, we explore how these three theoretical issues are reflected in applied psychological research in order to better understand whether these issues are concerns in practice. To investigate how power analysis is currently used, this study reviewed the reporting of 443 power analyses in high impact psychology journals in 2016 and 2017. It was found that researchers rarely use the minimally meaningful effect size as a rationale for the chosen effect in a power analysis. Further, precision-based approaches and collecting the maximum sample size feasible are almost never used in tandem with power analyses. In light of these findings, we offer that researchers should focus on tools beyond traditional power analysis when sample planning, such as collecting the maximum sample size feasible.


2020 ◽  
pp. 28-63
Author(s):  
A. G. Vinogradov

The article belongs to a special modern genre of scholar publications, so-called tutorials – articles devoted to the application of the latest methods of design, modeling or analysis in an accessible format in order to disseminate best practices. The article acquaints Ukrainian psychologists with the basics of using the R programming language to the analysis of empirical research data. The article discusses the current state of world psychology in connection with the Crisis of Confidence, which arose due to the low reproducibility of empirical research. This problem is caused by poor quality of psychological measurement tools, insufficient attention to adequate sample planning, typical statistical hypothesis testing practices, and so-called “questionable research practices.” The tutorial demonstrates methods for determining the sample size depending on the expected magnitude of the effect size and desired statistical power, performing basic variable transformations and statistical analysis of psychological research data using language and environment R. The tutorial presents minimal system of R functions required to carry out: modern analysis of reliability of measurement scales, sample size calculation, point and interval estimation of effect size for four the most widespread in psychology designs for the analysis of two variables’ interdependence. These typical problems include finding the differences between the means and variances in two or more samples, correlations between continuous and categorical variables. Practical information on data preparation, import, basic transformations, and application of basic statistical methods in the cloud version of RStudio is provided.


2017 ◽  
Author(s):  
Clarissa F. D. Carneiro ◽  
Thiago C. Moulin ◽  
Malcolm R. Macleod ◽  
Olavo B. Amaral

AbstractProposals to increase research reproducibility frequently call for focusing on effect sizes instead of p values, as well as for increasing the statistical power of experiments. However, it is unclear to what extent these two concepts are indeed taken into account in basic biomedical science. To study this in a real-case scenario, we performed a systematic review of effect sizes and statistical power in studies on learning of rodent fear conditioning, a widely used behavioral task to evaluate memory. Our search criteria yielded 410 experiments comparing control and treated groups in 122 articles. Interventions had a mean effect size of 29.5%, and amnesia caused by memory-impairing interventions was nearly always partial. Mean statistical power to detect the average effect size observed in well-powered experiments with significant differences (37.2%) was 65%, and was lower among studies with non-significant results. Only one article reported a sample size calculation, and our estimated sample size to achieve 80% power considering typical effect sizes and variances (15 animals per group) was reached in only 12.2% of experiments. Actual effect sizes correlated with effect size inferences made by readers on the basis of textual descriptions of results only when findings were non-significant, and neither effect size nor power correlated with study quality indicators, number of citations or impact factor of the publishing journal. In summary, effect sizes and statistical power have a wide distribution in the rodent fear conditioning literature, but do not seem to have a large influence on how results are described or cited. Failure to take these concepts into consideration might limit attempts to improve reproducibility in this field of science.


2012 ◽  
Vol 60 (6) ◽  
pp. 381 ◽  
Author(s):  
Evan Watkins ◽  
Julian Di Stefano

Hypotheses relating to the annual frequency distribution of mammalian births are commonly tested using a goodness-of-fit procedure. Several interacting factors influence the statistical power of these tests, but no power studies have been conducted using scenarios derived from biological hypotheses. Corresponding to theories relating reproductive output to seasonal resource fluctuation, we simulated data reflecting a winter reduction in birth frequency to test the effect of four factors (sample size, maximum effect size, the temporal pattern of response and the number of categories used for analysis) on the power of three goodness-of-fit procedures – the G and Chi-square tests and Watson’s U2 test. Analyses resulting in high power all had a large maximum effect size (60%) and were associated with a sample size of 200 on most occasions. The G-test was the most powerful when data were analysed using two temporal categories (winter and other) while Watson’s U2 test achieved the highest power when 12 monthly categories were used. Overall, the power of most modelled scenarios was low. Consequently, we recommend using power analysis as a research planning tool, and have provided a spreadsheet enabling a priori power calculations for the three tests considered.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Mirjam Moerbeek

Abstract Background A priori sample size calculation requires an a priori estimate of the size of the effect. An incorrect estimate may result in a sample size that is too low to detect effects or that is unnecessarily high. An alternative to a priori sample size calculation is Bayesian updating, a procedure that allows increasing sample size during the course of a study until sufficient support for a hypothesis is achieved. This procedure does not require and a priori estimate of the effect size. This paper introduces Bayesian updating to researchers in the biomedical field and presents a simulation study that gives insight in sample sizes that may be expected for two-group comparisons. Methods Bayesian updating uses the Bayes factor, which quantifies the degree of support for a hypothesis versus another one given the data. It can be re-calculated each time new subjects are added, without the need to correct for multiple interim analyses. A simulation study was conducted to study what sample size may be expected and how large the error rate is, that is, how often the Bayes factor shows most support for the hypothesis that was not used to generate the data. Results The results of the simulation study are presented in a Shiny app and summarized in this paper. Lower sample size is expected when the effect size is larger and the required degree of support is lower. However, larger error rates may be observed when a low degree of support is required and/or when the sample size at the start of the study is small. Furthermore, it may occur sufficient support for neither hypothesis is achieved when the sample size is bounded by a maximum. Conclusions Bayesian updating is a useful alternative to a priori sample size calculation, especially so in studies where additional subjects can be recruited easily and data become available in a limited amount of time. The results of the simulation study show how large a sample size can be expected and how large the error rate is.


2021 ◽  
Author(s):  
Mirjam Moerbeek

Abstract Background: A priori sample size calculation requires an a priori estimate of the size of the effect. An incorrect estimate may result in a sample size that is too low to detect effects or that is unnecessarily high. An alternative to a priori sample size calculation is Bayesian updating, a procedure that allows increasing sample size during the course of a study until sufficient support for a hypothesis is achieved. This procedure does not require and a priori estimate of the effect size. This paper introduces Bayesian updating to researchers in the biomedical field and presents a simulation study that gives insight in sample sizes that may be expected for two-group comparisons. Methods: Bayesian updating uses the Bayes factor, which quantifies the degree of support for a hypothesis versus another one given the data. It can be re-calculated each time new subjects are added, without the need to correct for multiple interim analyses. A simulation study was conducted to study what sample size may be expected and how large the error rate is, that is, how often the Bayes factor shows most support for the hypothesis that was not used to generate the data. Results: The results of the simulation study are presented in a Shiny app and summarized in this paper. Lower sample size is expected when the effect size is larger and the required degree of support is lower. However, larger error rates may be observed when a low degree of support is required and/or when the sample size at the start of the study is small. Furthermore, it may occur sufficient support for neither hypothesis is achieved when the sample size is bounded by a maximum. Conclusions: Bayesian updating is a useful alternative to a priori sample size calculation, especially so in studies where additional subjects can be recruited easily and data become available in a limited amount of time. The results of the simulation study show how large a sample size can be expected and how large the error rate is.


2018 ◽  
Vol 53 (7) ◽  
pp. 716-719
Author(s):  
Monica R. Lininger ◽  
Bryan L. Riemann

Objective: To describe the concept of statistical power as related to comparative interventions and how various factors, including sample size, affect statistical power.Background: Having a sufficiently sized sample for a study is necessary for an investigation to demonstrate that an effective treatment is statistically superior. Many researchers fail to conduct and report a priori sample-size estimates, which then makes it difficult to interpret nonsignificant results and causes the clinician to question the planning of the research design.Description: Statistical power is the probability of statistically detecting a treatment effect when one truly exists. The α level, a measure of differences between groups, the variability of the data, and the sample size all affect statistical power.Recommendations: Authors should conduct and provide the results of a priori sample-size estimations in the literature. This will assist clinicians in determining whether the lack of a statistically significant treatment effect is due to an underpowered study or to a treatment's actually having no effect.


2021 ◽  
Vol 3 (1) ◽  
pp. 61-89
Author(s):  
Stefan Geiß

Abstract This study uses Monte Carlo simulation techniques to estimate the minimum required levels of intercoder reliability in content analysis data for testing correlational hypotheses, depending on sample size, effect size and coder behavior under uncertainty. The ensuing procedure is analogous to power calculations for experimental designs. In most widespread sample size/effect size settings, the rule-of-thumb that chance-adjusted agreement should be ≥.80 or ≥.667 corresponds to the simulation results, resulting in acceptable α and β error rates. However, this simulation allows making precise power calculations that can consider the specifics of each study’s context, moving beyond one-size-fits-all recommendations. Studies with low sample sizes and/or low expected effect sizes may need coder agreement above .800 to test a hypothesis with sufficient statistical power. In studies with high sample sizes and/or high expected effect sizes, coder agreement below .667 may suffice. Such calculations can help in both evaluating and in designing studies. Particularly in pre-registered research, higher sample sizes may be used to compensate for low expected effect sizes and/or borderline coding reliability (e.g. when constructs are hard to measure). I supply equations, easy-to-use tables and R functions to facilitate use of this framework, along with example code as online appendix.


2020 ◽  
Vol 6 (2) ◽  
pp. 106-113
Author(s):  
A. M. Grjibovski ◽  
M. A. Gorbatova ◽  
A. N. Narkevich ◽  
K. A. Vinogradov

Sample size calculation in a planning phase is still uncommon in Russian research practice. This situation threatens validity of the conclusions and may introduce Type I error when the false null hypothesis is accepted due to lack of statistical power to detect the existing difference between the means. Comparing two means using unpaired Students’ ttests is the most common statistical procedure in the Russian biomedical literature. However, calculations of the minimal required sample size or retrospective calculation of the statistical power were observed only in very few publications. In this paper we demonstrate how to calculate required sample size for comparing means in unpaired samples using WinPepi and Stata software. In addition, we produced tables for minimal required sample size for studies when two means have to be compared and body mass index and blood pressure are the variables of interest. The tables were constructed for unpaired samples for different levels of statistical power and standard deviations obtained from the literature.


Sign in / Sign up

Export Citation Format

Share Document