The Ouroboros of Psychological Methodology: The Case of Effect Sizes (Mechanical Objectivity vs. Expertise)

Ian J. Davidson

doi:10.1037/gpr0000154

The Ouroboros of Psychological Methodology: The Case of Effect Sizes (Mechanical Objectivity vs. Expertise)

Review of General Psychology ◽

10.1037/gpr0000154 ◽

2018 ◽

Vol 22 (4) ◽

pp. 469-476 ◽

Cited By ~ 4

Author(s):

Ian J. Davidson

Keyword(s):

Effect Size ◽

Null Hypothesis ◽

Significance Test ◽

Effect Sizes ◽

Circular Path ◽

Bottom Up ◽

Psychological Science ◽

Null Hypothesis Significance Test ◽

Theoretical Tool ◽

Mechanical Objectivity

The reporting and interpretation of effect sizes is often promoted as a panacea for the ramifications of institutionalized statistical rituals associated with the null-hypothesis significance test. Mechanical objectivity—conflating the use of a method with the obtainment of truth—is a useful theoretical tool for understanding the possible failure of effect size reporting ( Porter, 1995 ). This article helps elucidate the ouroboros of psychological methodology. This is the cycle of improved tools to produce trustworthy knowledge, leading to their institutionalization and adoption as forms of thinking, leading to methodologists eventually admonishing researchers for relying too heavily on rituals, finally leading to the production of more new improved quantitative tools that may follow along this circular path. Despite many critiques and warnings, research psychologists’ superficial adoption of effect sizes might preclude expert interpretation much like in the null-hypothesis significance test as widely received. One solution to this situation is bottom-up: promoting a balance of mechanical objectivity and expertise in the teaching of methods and research. This would require the acceptance and encouragement of expert interpretation within psychological science.

Download Full-text

Effect Size

Psychology ◽

10.1093/obo/9780199828340-0247 ◽

2019 ◽

Author(s):

David B. Flora

Keyword(s):

Effect Size ◽

Null Hypothesis ◽

Power Analysis ◽

Meta Analysis ◽

Significance Test ◽

Effect Sizes ◽

Research Reporting ◽

The Difference ◽

Strength Of Association ◽

True Population

Simply put, effect size (ES) is the magnitude or strength of association between or among variables. Effect sizes (ESs) are commonly represented numerically (i.e., as parameters for population ESs and statistics for sample estimates of population ESs) but also may be communicated graphically. Although the word “effect” may imply that an ES quantifies the strength of a causal association (“cause and effect”), ESs are used more broadly to represent any empirical association between variables. Effect sizes serve three general purposes: research results reporting, power analysis, and meta-analysis. Even under the same research design, an ES that is appropriate for one of these purposes may not be ideal for another. Effect size can be conveyed graphically or numerically using either unstandardized metrics, which are interpreted relative to the original scales of the variables involved (e.g., the difference between two means or an unstandardized regression slope), or standardized metrics, which are interpreted in relative terms (e.g., Cohen’s d or multiple R2). Whereas unstandardized ESs and graphs illustrating ES are typically most effective for research reporting, that is, communicating the original findings of an empirical study, many standardized ES measures have been developed for use in power analysis and especially meta-analysis. Although the concept of ES is clearly fundamental to data analysis, ES reporting has been advocated as an essential complement to null hypothesis significance testing (NHST), or even as a replacement for NHST. A null hypothesis significance test involves making a dichotomous judgment about whether to reject a hypothesis that a true population effect equals zero. Even in the context of a traditional NHST paradigm, ES is a critical concept because of its central role in power analysis.

Download Full-text

Significance tests cannot be justified in theory-corroboration experiments

Behavioral and Brain Sciences ◽

10.1017/s0140525x98421162 ◽

1998 ◽

Vol 21 (2) ◽

pp. 213-213

Author(s):

Marks R. Nester

Keyword(s):

Null Hypothesis ◽

Test Procedure ◽

Significance Test ◽

Effect Sizes ◽

Standard Errors ◽

Confidence Limits ◽

Significance Tests ◽

Null Hypothesis Significance Test

Chow's one-tailed null-hypothesis significance-test procedure, with its rationale based on the elimination of chance influences, is not appropriate for theory-corroboration experiments. Estimated effect sizes and their associated standard errors or confidence limits will always suffice.

Download Full-text

The null-hypothesis significance-test procedure is still warranted

Behavioral and Brain Sciences ◽

10.1017/s0140525x98591169 ◽

1998 ◽

Vol 21 (2) ◽

pp. 228-235 ◽

Cited By ~ 2

Author(s):

Siu L. Chow

Keyword(s):

Null Hypothesis ◽

Statistical Power ◽

Meta Analysis ◽

Statistical Significance ◽

Test Procedure ◽

Significance Test ◽

Null Hypothesis Significance Test ◽

Wide Range ◽

Statistical Hypotheses ◽

Alternative Hypotheses

Entertaining diverse assumptions about empirical research, commentators give a wide range of verdicts on the NHSTP defence in Statistical significance. The null-hypothesis significance-test procedure (NHSTP) is defended in a framework in which deductive and inductive rules are deployed in theory corroboration in the spirit of Popper's Conjectures and refutations (1968b). The defensible hypothetico-deductive structure of the framework is used to make explicit the distinctions between (1) substantive and statistical hypotheses, (2) statistical alternative and conceptual alternative hypotheses, and (3) making statistical decisions and drawing theoretical conclusions. These distinctions make it easier to show that (1) H0 can be true, (2) the effect size is irrelevant to theory corroboration, and (3) “strong” hypotheses make no difference to NHSTP. Reservations about statistical power, meta-analysis, and the Bayesian approach are still warranted.

Download Full-text

An old problem in the spotlight again: The mistaken practice of the null-hypothesis significance test

Statisztikai Szemle ◽

10.20311/stat2016.01.hu0052 ◽

2016 ◽

Vol 94 (1) ◽

pp. 52-75 ◽

Cited By ~ 1

Author(s):

Anna Bárdits ◽

Renáta Németh ◽

Győző Terplán

Keyword(s):

Null Hypothesis ◽

Significance Test ◽

Null Hypothesis Significance Test

Download Full-text

Meta-analysis, power analysis, and the null-hypothesis significance-test procedure

Behavioral and Brain Sciences ◽

10.1017/s0140525x98461168 ◽

1998 ◽

Vol 21 (2) ◽

pp. 216-217 ◽

Cited By ~ 2

Author(s):

Joseph S. Rossi

Keyword(s):

Null Hypothesis ◽

Power Analysis ◽

Meta Analysis ◽

Verbal Learning ◽

Test Procedure ◽

Significance Test ◽

Size Estimation ◽

Null Hypothesis Significance Test ◽

Effect Size Estimation ◽

History Of

Chow's (1996) defense of the null-hypothesis significance-test procedure (NHSTP) is thoughtful and compelling in many respects. Nevertheless, techniques such as meta-analysis, power analysis, effect size estimation, and confidence intervals can be useful supplements to NHSTP in furthering the cumulative nature of behavioral research, as illustrated by the history of research on the spontaneous recovery of verbal learning.

Download Full-text

Are Most Published Research Findings False In A Continuous Universe?

10.31222/osf.io/jk7sa ◽

2021 ◽

Author(s):

Kleber Neves ◽

Pedro Batista Tan ◽

Olavo Bohrer Amaral

Keyword(s):

Publication Bias ◽

Effect Size ◽

Statistical Power ◽

Statistical Significance ◽

Significance Test ◽

Effect Sizes ◽

Diagnostic Screening ◽

Research Findings ◽

Published Research ◽

Effect Size Distribution

Diagnostic screening models for the interpretation of null hypothesis significance test (NHST) results have been influential in highlighting the effect of selective publication on the reproducibility of the published literature, leading to John Ioannidis’ much-cited claim that most published research findings are false. These models, however, are typically based on the assumption that hypotheses are dichotomously true or false, without considering that effect sizes for different hypotheses are not the same. To address this limitation, we develop a simulation model that overcomes this by modeling effect sizes explicitly using different continuous distributions, while retaining other aspects of previous models such as publication bias and the pursuit of statistical significance. Our results show that the combination of selective publication, bias, low statistical power and unlikely hypotheses consistently leads to high proportions of false positives, irrespective of the effect size distribution assumed. Using continuous effect sizes also allows us to evaluate the degree of effect size overestimation and prevalence of estimates with the wrong signal in the literature, showing that the same factors that drive false-positive results also lead to errors in estimating effect size direction and magnitude. Nevertheless, the relative influence of these factors on different metrics varies depending on the distribution assumed for effect sizes. The model is made available as an R ShinyApp interface, allowing one to explore features of the literature in various scenarios.

Download Full-text

Introducing the null hypothesis significance test

Design and Analysis in Educational Research ◽

10.4324/9780429432798-4 ◽

2020 ◽

pp. 59-71

Author(s):

Kamden K. Strunk ◽

Mwarumba Mwavita

Keyword(s):

Null Hypothesis ◽

Significance Test ◽

Null Hypothesis Significance Test

Download Full-text

On the Surprising Longevity of Flogged Horses: Why There Is a Case for the Significance Test

Psychological Science ◽

10.1111/j.1467-9280.1997.tb00536.x ◽

1997 ◽

Vol 8 (1) ◽

pp. 12-15 ◽

Cited By ~ 70

Author(s):

Robert P. Abelson

Keyword(s):

Null Hypothesis ◽

Goodness Of Fit ◽

Meta Analysis ◽

Significance Test ◽

Effect Sizes ◽

Significance Tests ◽

Frequent Use ◽

Alternative Procedures ◽

Mean Differences ◽

New Research

Criticisms of null-hypothesis significance tests (NHSTs) are reviewed Used as formal, two-valued decision procedures, they often generate misleading conclusions However, critics who argue that NHSTs are totally meaningless because the null hypothesis is virtually always false are overstating their case Critics also neglect the whole class of valuable significance tests that assess goodness of fit of models to data Even as applied to simple mean differences, NHSTs can be rhetorically useful in defending research against criticisms that random factors adequately explain the results, or that the direction of mean difference was not demonstrated convincingly Principled argument and counterargument produce the lore, or communal understanding, in a field, which in turn helps guide new research Alternative procedures–confidence intervals, effect sizes, and meta-analysis–are discussed Although these alternatives are not totally free from criticism either, they deserve more frequent use, without an unwise ban on NHSTs

Download Full-text

When the Numbers Do Not Add Up: The Practical Limits of Stochastologicals for Soft Psychology

Perspectives on Psychological Science ◽

10.1177/1745691620970557 ◽

2021 ◽

pp. 174569162097055

Author(s):

Nick J. Broers

Keyword(s):

High Risk ◽

Null Hypothesis ◽

Meta Analysis ◽

Effect Sizes ◽

Significance Testing ◽

Null Hypothesis Significance Testing ◽

Psychological Science ◽

Body Of Knowledge ◽

Psychological Theories ◽

Time Continuum

One particular weakness of psychology that was left implicit by Meehl is the fact that psychological theories tend to be verbal theories, permitting at best ordinal predictions. Such predictions do not enable the high-risk tests that would strengthen our belief in the verisimilitude of theories but instead lead to the practice of null-hypothesis significance testing, a practice Meehl believed to be a major reason for the slow theoretical progress of soft psychology. The rising popularity of meta-analysis has led some to argue that we should move away from significance testing and focus on the size and stability of effects instead. Proponents of this reform assume that a greater emphasis on quantity can help psychology to develop a cumulative body of knowledge. The crucial question in this endeavor is whether the resulting numbers really have theoretical meaning. Psychological science lacks an undisputed, preexisting domain of observations analogous to the observations in the space-time continuum in physics. It is argued that, for this reason, effect sizes do not really exist independently of the adopted research design that led to their manifestation. Consequently, they can have no bearing on the verisimilitude of a theory.

Download Full-text

The Counternull Value of an Effect Size: A New Statistic

Psychological Science ◽

10.1111/j.1467-9280.1994.tb00281.x ◽

1994 ◽

Vol 5 (6) ◽

pp. 329-334 ◽

Cited By ~ 75

Author(s):

Robert Rosenthal ◽

Donald B. Rubin

Keyword(s):

Confidence Interval ◽

Effect Size ◽

Null Hypothesis ◽

Degree Of Freedom ◽

Effect Sizes ◽

P Value ◽

Summary Effect ◽

Null Value ◽

Meta Analyses

We introduce a new, readily computed statistic, the counternull value of an obtained effect size, which is the nonnull magnitude of effect size that is supported by exactly the same amount of evidence as supports the null value of the effect size In other words, if the counternull value were taken as the null hypothesis, the resulting p value would be the same as the obtained p value for the actual null hypothesis Reporting the counternull, in addition to the p value, virtually eliminates two common errors (a) equating failure to reject the null with the estimation of the effect size as equal to zero and (b) taking the rejection of a null hypothesis on the basis of a significant p value to imply a scientifically important finding In many common situations with a one-degree-of-freedom effect size, the value of the counternull is simply twice the magnitude of the obtained effect size, but the counternull is defined in general, even with multi-degree-of-freedom effect sizes, and therefore can be applied when a confidence interval cannot be The use of the counter-null can be especially useful in meta-analyses when evaluating the scientific importance of summary effect sizes

Download Full-text