The Meaningfulness of Statistical Significance Tests in the Analysis of Simulation Results

Klaus G. Troitzsch

doi:10.4018/ijats.2016010102

The Meaningfulness of Statistical Significance Tests in the Analysis of Simulation Results

International Journal of Agent Technologies and Systems ◽

10.4018/ijats.2016010102 ◽

2016 ◽

Vol 8 (1) ◽

pp. 18-45

Author(s):

Klaus G. Troitzsch

Keyword(s):

Stochastic Process ◽

Distribution Function ◽

Simulation Model ◽

Effect Size ◽

Statistical Significance ◽

Effect Sizes ◽

Significance Tests ◽

Agent Based ◽

Input Parameters ◽

Simulation Results

This article discusses the question of whether significance tests on simulation results are meaningful at all. It is also argued that it is the effect size much more than the existence of the effect is what matters. It is the description of the distribution function of the stochastic process incorporated in the simulation model which is important. This is particularly when this distribution is far from normal, which is particularly often the case when the simulation model is nonlinear. To this end, this article uses three different agent-based models to demonstrate that the effects of input parameters on output metrics can often be made “statistically significant” on any desired level by increasing the number of runs, even for negligible effect sizes. The examples are also used to give hints as to how many runs are necessary to estimate effect sizes and how the input parameters determine output metrics.

Download Full-text

Analysing Simulation Results Statistically

Interdisciplinary Applications of Agent-Based Social Simulation and Modeling - Advances in Human and Social Aspects of Technology ◽

10.4018/978-1-4666-5954-4.ch006 ◽

2014 ◽

pp. 88-105 ◽

Cited By ~ 2

Author(s):

Klaus G. Troitzsch

Keyword(s):

Social Sciences ◽

Stochastic Process ◽

Distribution Function ◽

Simulation Model ◽

Effect Size ◽

Significance Tests ◽

The Social ◽

Simulation Results ◽

Level Of Significance

Many papers on simulation in the social sciences come up with significance tests in which the authors describe the effect of a parameter on some simulation outcome as significant on some level of significance. This chapter discusses the question whether significance tests on simulation results are meaningful, and it argues that it is the effect size much more than the existence of the effect that matters and that it is the description of the distribution function of the stochastic process incorporated in the simulation model which is important, particularly when this distribution is far from normal — which is particularly often the case when the simulation model is nonlinear.

Download Full-text

The use of effect size indices to determine practical significance

Suid-Afrikaanse Tydskrif vir Natuurwetenskap en Tegnologie ◽

10.4102/satnt.v25i3.157 ◽

2006 ◽

Vol 25 (3) ◽

Author(s):

H. S. Styn ◽

S. M. Ellis

Keyword(s):

Effect Size ◽

Statistical Significance ◽

Empirical Studies ◽

Research Literature ◽

Effect Sizes ◽

Practical Significance ◽

Significance Tests ◽

Statistical Application ◽

Significant Difference

The determination of significance of differences in means and of relationships between variables is of importance in many empirical studies. Usually only statistical significance is reported, which does not necessarily indicate an important (practically significant) difference or relationship. With studies based on probability samples, effect size indices should be reported in addition to statistical significance tests in order to comment on practical significance. Where complete populations or convenience samples are worked with, the determination of statistical significance is strictly speaking no longer relevant, while the effect size indices can be used as a basis to judge significance. In this article attention is paid to the use of effect size indices in order to establish practical significance. It is also shown how these indices are utilized in a few fields of statistical application and how it receives attention in statistical literature and computer packages. The use of effect sizes is illustrated by a few examples from the research literature.

Download Full-text

Effect Size and Effect Uncertainty in Organizational Research Methods

Oxford Research Encyclopedia of Business and Management ◽

10.1093/acrefore/9780190224851.013.238 ◽

2021 ◽

Author(s):

Scott B. Morris ◽

Arash Shokri

Keyword(s):

Confidence Intervals ◽

Effect Size ◽

Sampling Error ◽

Statistical Significance ◽

Scientific Progress ◽

Effect Sizes ◽

Practical Significance ◽

Significance Tests ◽

Wide Range ◽

Research Findings

To understand and communicate research findings, it is important for researchers to consider two types of information provided by research results: the magnitude of the effect and the degree of uncertainty in the outcome. Statistical significance tests have long served as the mainstream method for statistical inferences. However, the widespread misinterpretation and misuse of significance tests has led critics to question their usefulness in evaluating research findings and to raise concerns about the far-reaching effects of this practice on scientific progress. An alternative approach involves reporting and interpreting measures of effect size along with confidence intervals. An effect size is an indicator of magnitude and direction of a statistical observation. Effect size statistics have been developed to represent a wide range of research questions, including indicators of the mean difference between groups, the relative odds of an event, or the degree of correlation among variables. Effect sizes play a key role in evaluating practical significance, conducting power analysis, and conducting meta-analysis. While effect sizes summarize the magnitude of an effect, the confidence intervals represent the degree of uncertainty in the result. By presenting a range of plausible alternate values that might have occurred due to sampling error, confidence intervals provide an intuitive indicator of how strongly researchers should rely on the results from a single study.

Download Full-text

Significance tests: Necessary but not sufficient

Behavioral and Brain Sciences ◽

10.1017/s0140525x98521164 ◽

1998 ◽

Vol 21 (2) ◽

pp. 221-222

Author(s):

Louis G. Tassinary

Keyword(s):

Effect Size ◽

Scientific Community ◽

Statistical Significance ◽

Significance Tests ◽

Experimental Control

Chow (1996) offers a reconceptualization of statistical significance that is reasoned and comprehensive. Despite a somewhat rough presentation, his arguments are compelling and deserve to be taken seriously by the scientific community. It is argued that his characterization of literal replication, types of research, effect size, and experimental control are in need of revision.

Download Full-text

The Other Half of the Story: Effect Size Analysis in Quantitative Research

CBE—Life Sciences Education ◽

10.1187/cbe.13-04-0082 ◽

2013 ◽

Vol 12 (3) ◽

pp. 345-351 ◽

Cited By ~ 134

Author(s):

Jessica Middlemis Maher ◽

Jonathan C. Markey ◽

Diane Ebert-May

Keyword(s):

Educational Research ◽

Effect Size ◽

Quantitative Research ◽

Statistical Significance ◽

The Other ◽

Practical Significance ◽

Significance Testing ◽

Size Analysis ◽

Significance Tests ◽

Statistical Significance Testing

Statistical significance testing is the cornerstone of quantitative research, but studies that fail to report measures of effect size are potentially missing a robust part of the analysis. We provide a rationale for why effect size measures should be included in quantitative discipline-based education research. Examples from both biological and educational research demonstrate the utility of effect size for evaluating practical significance. We also provide details about some effect size indices that are paired with common statistical significance tests used in educational research and offer general suggestions for interpreting effect size measures. Finally, we discuss some inherent limitations of effect size measures and provide further recommendations about reporting confidence intervals.

Download Full-text

Interpreting Statistical Significance and Meaningfulness in Adapted Physical Activity Research

Adapted Physical Activity Quarterly ◽

10.1123/apaq.15.2.103 ◽

1998 ◽

Vol 15 (2) ◽

pp. 103-118 ◽

Cited By ~ 31

Author(s):

Vinson H. Sutlive ◽

Dale A. Ulrich

Keyword(s):

Physical Activity ◽

Sample Size ◽

Recent Literature ◽

Statistical Significance ◽

Effect Sizes ◽

Significance Tests ◽

Adapted Physical Activity ◽

Research Designs ◽

Alpha Level ◽

Research Findings

The unqualified use of statistical significance tests for interpreting the results of empirical research has been called into question by researchers in a number of behavioral disciplines. This paper reviews what statistical significance tells us and what it does not, with particular attention paid to criticisms of using the results of these tests as the sole basis for evaluating the overall significance of research findings. In addition, implications for adapted physical activity research are discussed. Based on the recent literature of other disciplines, several recommendations for evaluating and reporting research findings are made. They include calculating and reporting effect sizes, selecting an alpha level larger than the conventional .05 level, placing greater emphasis on replication of results, evaluating results in a sample size context, and employing simple research designs. Adapted physical activity researchers are encouraged to use specific modifiers when describing findings as significant.

Download Full-text

Effect Sizes and "What If" Analyses as Supplements to Statistical Significance Tests

Journal of Early Intervention ◽

10.1177/105381510302500406 ◽

2003 ◽

Vol 25 (4) ◽

pp. 310-319 ◽

Cited By ~ 9

Author(s):

Susan Pedersen

Keyword(s):

Statistical Significance ◽

Effect Sizes ◽

Significance Tests

Download Full-text

Can Reliance be Placed on a Single Meta-Analysis?

Australian & New Zealand Journal of Psychiatry ◽

10.3109/00048679009077710 ◽

1990 ◽

Vol 24 (3) ◽

pp. 405-415 ◽

Cited By ~ 16

Author(s):

Nathaniel McConaghy

Keyword(s):

Literature Review ◽

Effect Size ◽

Meta Analysis ◽

Statistical Significance ◽

Effect Sizes ◽

Control Groups ◽

Consistent Finding ◽

Placebo Controls ◽

Effect Of Treatment ◽

Meta Analyses

Meta-analysis replaced statistical significance with effect size in the hope of resolving controversy concerning evaluation of treatment effects. Statistical significance measured reliability of the effect of treatment, not its efficacy. It was strongly influenced by the number of subjects investigated. Effect size as assessed originally, eliminated this influence but by standardizing the size of the treatment effect could distort it. Meta-analyses which combine the results of studies which employ different subject types, outcome measures, treatment aims, no-treatment rather than placebo controls or therapists with varying experience can be misleading. To ensure discussion of these variables meta-analyses should be used as an aid rather than a substitute for literature review. While meta-analyses produce contradictory findings, it seems unwise to rely on the conclusions of an individual analysis. Their consistent finding that placebo treatments obtain markedly higher effect sizes than no treatment hopefully will render the use of untreated control groups obsolete.

Download Full-text

Determining sexual dimorphism in frog measurement data: integration of statistical significance, measurement error, effect size and biological significance

Anais da Academia Brasileira de Ciências ◽

10.1590/s0001-37652005000100005 ◽

2005 ◽

Vol 77 (1) ◽

pp. 45-76 ◽

Cited By ~ 8

Author(s):

Lee-Ann C. Hayek ◽

W. Ronald Heyer

Keyword(s):

Sexual Dimorphism ◽

Measurement Error ◽

Effect Size ◽

Statistical Significance ◽

Measurement Data ◽

Biological Significance ◽

Effect Sizes ◽

Statistical Hypothesis ◽

Multivariate Techniques ◽

Error Index

Several analytic techniques have been used to determine sexual dimorphism in vertebrate morphological measurement data with no emergent consensus on which technique is superior. A further confounding problem for frog data is the existence of considerable measurement error. To determine dimorphism, we examine a single hypothesis (Ho = equal means) for two groups (females and males). We demonstrate that frog measurement data meet assumptions for clearly defined statistical hypothesis testing with statistical linear models rather than those of exploratory multivariate techniques such as principal components, correlation or correspondence analysis. In order to distinguish biological from statistical significance of hypotheses, we propose a new protocol that incorporates measurement error and effect size. Measurement error is evaluated with a novel measurement error index. Effect size, widely used in the behavioral sciences and in meta-analysis studies in biology, proves to be the most useful single metric to evaluate whether statistically significant results are biologically meaningful. Definitions for a range of small, medium, and large effect sizes specifically for frog measurement data are provided. Examples with measurement data for species of the frog genus Leptodactylus are presented. The new protocol is recommended not only to evaluate sexual dimorphism for frog data but for any animal measurement data for which the measurement error index and observed or a priori effect sizes can be calculated.

Download Full-text

Recalibrating expectations about effect size: A multi-method survey of effect sizes in the ABCD study

PLoS ONE ◽

10.1371/journal.pone.0257535 ◽

2021 ◽

Vol 16 (9) ◽

pp. e0257535

Author(s):

Max M. Owens ◽

Alexandra Potter ◽

Courtland S. Hyatt ◽

Matthew Albaugh ◽

Wesley K. Thompson ◽

...

Keyword(s):

Effect Size ◽

Statistical Significance ◽

Psychological Research ◽

Sociodemographic Factors ◽

Effect Sizes ◽

Size Distributions ◽

Different Types ◽

Two Factors ◽

Future Work ◽

Effect Size Distribution

Effect sizes are commonly interpreted using heuristics established by Cohen (e.g., small: r = .1, medium r = .3, large r = .5), despite mounting evidence that these guidelines are mis-calibrated to the effects typically found in psychological research. This study’s aims were to 1) describe the distribution of effect sizes across multiple instruments, 2) consider factors qualifying the effect size distribution, and 3) identify examples as benchmarks for various effect sizes. For aim one, effect size distributions were illustrated from a large, diverse sample of 9/10-year-old children. This was done by conducting Pearson’s correlations among 161 variables representing constructs from all questionnaires and tasks from the Adolescent Brain and Cognitive Development Study® baseline data. To achieve aim two, factors qualifying this distribution were tested by comparing the distributions of effect size among various modifications of the aim one analyses. These modified analytic strategies included comparisons of effect size distributions for different types of variables, for analyses using statistical thresholds, and for analyses using several covariate strategies. In aim one analyses, the median in-sample effect size was .03, and values at the first and third quartiles were .01 and .07. In aim two analyses, effects were smaller for associations across instruments, content domains, and reporters, as well as when covarying for sociodemographic factors. Effect sizes were larger when thresholding for statistical significance. In analyses intended to mimic conditions used in “real-world” analysis of ABCD data, the median in-sample effect size was .05, and values at the first and third quartiles were .03 and .09. To achieve aim three, examples for varying effect sizes are reported from the ABCD dataset as benchmarks for future work in the dataset. In summary, this report finds that empirically determined effect sizes from a notably large dataset are smaller than would be expected based on existing heuristics.

Download Full-text