Understanding Statistical and Clinical Significance: Hypothesis Testing

H. Glenn Anderson; Michael G. Kendrach; Shana Trice

doi:10.1177/089719009801100309

Understanding Statistical and Clinical Significance: Hypothesis Testing

Journal of Pharmacy Practice ◽

10.1177/089719009801100309 ◽

1998 ◽

Vol 11 (3) ◽

pp. 181-195 ◽

Cited By ~ 2

Author(s):

H. Glenn Anderson ◽

Michael G. Kendrach ◽

Shana Trice

Keyword(s):

Decision Making ◽

Hypothesis Testing ◽

Confidence Intervals ◽

Clinical Significance ◽

Effect Size ◽

Statistical Significance ◽

Central Tendency ◽

Step Process ◽

Research Results

This primer reviews a number of statistical concepts integral to the hypothesis testing process and its role in decision making. Concepts of variables, scales of measure, and measures of central tendency and dispersion are discussed, and a 5-step process of hypothesis testing is presented. Finally, a discussion of the statistical and clinical significance of research results is presented, along with the concept of confidence intervals as a method of conveying information about the effect size as well as the statistical significance of a difference between groups.

Download Full-text

Statistical significance or clinical significance? A researcher's dilemma for appropriate interpretation of research results

Saudi Journal of Anaesthesia ◽

10.4103/sja.sja_158_21 ◽

2021 ◽

Vol 15 (4) ◽

pp. 431

Author(s):

Hunny Sharma

Keyword(s):

Clinical Significance ◽

Statistical Significance ◽

Research Results

Download Full-text

Statistical Significance, Effect Size Reporting, and Confidence Intervals: Best Reporting Strategies

Journal for Research in Mathematics Education ◽

10.2307/30034803 ◽

2004 ◽

Vol 35 (1) ◽

pp. 57 ◽

Cited By ~ 11

Author(s):

Robert M. Capraro

Keyword(s):

Confidence Intervals ◽

Effect Size ◽

Statistical Significance

Download Full-text

An Examination of Graduate Students' Statistical Judgments: Statistical and Fuzzy Set Approaches

Psychological Reports ◽

10.2466/pr0.2000.86.1.243 ◽

2000 ◽

Vol 86 (1) ◽

pp. 243-259 ◽

Cited By ~ 2

Author(s):

Sumiko Takayanagi ◽

Norman Cliff

Keyword(s):

Decision Making ◽

Graduate Students ◽

Hypothesis Testing ◽

Set Theory ◽

Fuzzy Set ◽

Fuzzy Set Theory ◽

Statistical Significance ◽

Significance Levels ◽

Integrative Study ◽

Fuzziness Index

The present study examined how statistical significance levels are treated and interpreted by graduate students who use hypothesis-testing in their scientific investigation. To test underlying psychological aspects of hypothesis-testing, the idea of fuzzy set theory was employed to identify the uncertain points in judgments. 34 graduate students in a psychology department made judgments about hypothetical statistical decisions. The results indicated that (1) the majority of these students treated significance levels on a continuum and rated them according to the magnitude of statistical significance; (2) the subjects shifted their decisions based on the types of hypothetical scenarios but not by the sample sizes; instead, they interpreted a smaller sample size as being less reliable. (3) The subjects frequently chose formally used statistical terms, e.g., Significant and Not Significant, more than graduated verbal expressions, e.g., Marginally Significant and Borderline Significant; and (4) the Fuzziness (degree of confidence in decision-making) was dependent on individuals and existed more in the critical points of transition where judgments are most difficult. The Fuzziness Index illustrated the subtle shifts of human decision-making patterns in statistical judgments. Underlying decision uncertainties and difficulties can be illustrated by functions generated from fuzzy set theory, which may more closely resemble human psychological mechanism. This integrative study of fuzzy set theory and behavioral measurements appears to provide a technique that is more natural for examining and understanding imprecise boundaries of human decisions.

Download Full-text

Confidence Intervals Assess Both Clinical Significance and Statistical Significance

Annals of Internal Medicine ◽

10.7326/0003-4819-114-6-515 ◽

1991 ◽

Vol 114 (6) ◽

pp. 515 ◽

Cited By ~ 91

Author(s):

Leonard E. Brahman

Keyword(s):

Confidence Intervals ◽

Clinical Significance ◽

Statistical Significance

Download Full-text

Hypothesis Testing, p Values, Confidence Intervals, Measures of Effect Size, and Bayesian Methods in Light of Modern Robust Techniques

Educational and Psychological Measurement ◽

10.1177/0013164416667983 ◽

2016 ◽

Vol 77 (4) ◽

pp. 673-689 ◽

Cited By ~ 4

Author(s):

Rand R. Wilcox ◽

Sarfaraz Serang

Keyword(s):

Hypothesis Testing ◽

Confidence Intervals ◽

Bayesian Methods ◽

Effect Size ◽

Null Hypothesis ◽

P Values ◽

Null Hypothesis Testing ◽

Robust Techniques ◽

Alternative Techniques ◽

Robust Statistical Methods

The article provides perspectives on p values, null hypothesis testing, and alternative techniques in light of modern robust statistical methods. Null hypothesis testing and p values can provide useful information provided they are interpreted in a sound manner, which includes taking into account insights and advances that have occurred during the past 50 years. There are, of course, limitations to what null hypothesis testing and p values reveal about data. But modern advances make it clear that there are serious limitations and concerns associated with conventional confidence intervals, standard Bayesian methods, and commonly used measures of effect size. Many of these concerns can be addressed using modern robust methods.

Download Full-text

Is p-value 0.05 enough? A study on the statistical evaluation of classifiers

The Knowledge Engineering Review ◽

10.1017/s0269888920000417 ◽

2020 ◽

Vol 36 ◽

Author(s):

Nadine M. Neumann ◽

Alexandre Plastino ◽

Jony A. Pinto Junior ◽

Alex A. Freitas

Keyword(s):

Decision Making ◽

Effect Size ◽

Statistical Power ◽

Statistical Significance ◽

Empirical Studies ◽

Wilcoxon Test ◽

P Value ◽

Hypothesis Tests ◽

Value Analysis ◽

Power Of The Test

Abstract Statistical significance analysis, based on hypothesis tests, is a common approach for comparing classifiers. However, many studies oversimplify this analysis by simply checking the condition p-value < 0.05, ignoring important concepts such as the effect size and the statistical power of the test. This problem is so worrying that the American Statistical Association has taken a strong stand on the subject, noting that although the p-value is a useful statistical measure, it has been abusively used and misinterpreted. This work highlights problems caused by the misuse of hypothesis tests and shows how the effect size and the power of the test can provide important information for better decision-making. To investigate these issues, we perform empirical studies with different classifiers and 50 datasets, using the Student’s t-test and the Wilcoxon test to compare classifiers. The results show that an isolated p-value analysis can lead to wrong conclusions and that the evaluation of the effect size and the power of the test contributes to a more principled decision-making.

Download Full-text

Statistical Primer for Athletic Trainers: Using Confidence Intervals and Effect Sizes to Evaluate Clinical Meaningfulness

Journal of Athletic Training ◽

10.4085/1062-6050-51.12.14 ◽

2016 ◽

Vol 51 (12) ◽

pp. 1045-1048 ◽

Cited By ~ 8

Author(s):

Monica Lininger ◽

Bryan L. Riemann

Keyword(s):

Confidence Intervals ◽

Effect Size ◽

Research Study ◽

Statistical Significance ◽

Effect Sizes ◽

Population Parameter ◽

Significant Difference ◽

Sample Data ◽

The Difference ◽

Clinical Meaningfulness

Objective: To describe confidence intervals (CIs) and effect sizes and provide practical examples to assist clinicians in assessing clinical meaningfulness. Background: As discussed in our first article in 2015, which addressed the difference between statistical significance and clinical meaningfulness, evaluating the clinical meaningfulness of a research study remains a challenge to many readers. In this paper, we will build on this topic by examining CIs and effect sizes. Description: A CI is a range estimated from sample data (the data we collect) that is likely to include the population parameter (value) of interest. Conceptually, this constitutes the lower and upper limits of the sample data, which would likely include, for example, the mean from the unknown population. An effect size is the magnitude of difference between 2 means. When a statistically significant difference exists between 2 means, effect size is used to describe how large or small that difference actually is. Confidence intervals and effect sizes enhance the practical interpretation of research results. Recommendations: Along with statistical significance, the CI and effect size can assist practitioners in better understanding the clinical meaningfulness of a research study.

Download Full-text

Smaller clinical trials for decision making; using p-values could be costly

F1000Research ◽

10.12688/f1000research.15522.1 ◽

2018 ◽

Vol 7 ◽

pp. 1176 ◽

Cited By ~ 1

Author(s):

Nicholas Graves ◽

Adrian G. Barnett ◽

Edward Burn ◽

David Cook

Keyword(s):

Decision Making ◽

Clinical Trials ◽

Economic Analysis ◽

Hypothesis Testing ◽

Sample Size ◽

Statistical Significance ◽

Decision Makers ◽

Statistical Hypothesis ◽

Statistical Hypothesis Testing ◽

Good Investment

Background: Clinical trials might be larger than needed because arbitrary high levels of statistical confidence are sought in the results. Traditional sample size calculations ignore the marginal value of the information collected for decision making. The statistical hypothesis testing objective is misaligned with the goal of generating information necessary for decision-making. The aim of the present study was to show that a clinical trial designed to test a prior hypothesis against an arbitrary threshold of confidence may recruit too many participants, wasting scarce research dollars and exposing participants to research unnecessarily. Methods: We used data from a recent RCT powered for traditional rules of statistical significance. The data were also used for an economic analysis to show the intervention led to cost savings and improved health outcomes. Adoption represented a good investment for decision-makers. We examined the effect of reducing the trial’s sample size on the results of the statistical hypothesis-testing analysis and the conclusions that would be drawn by decision-makers reading the economic analysis. Results: As the sample size reduced it became more likely that the null hypothesis of no difference in the primary outcome between groups would fail to be rejected. For decision-makers reading the economic analysis, reducing the sample size had little effect on the conclusion about whether to adopt the intervention. There was always high probability the intervention reduced costs and improved health. Conclusions: Decision makers managing health services are largely invariant to the sample size of the primary trial and the arbitrary p-value of 0.05. If the goal is to make a good decision about whether the intervention should be adopted widely, then that could have been achieved with a much smaller trial. It is plausible that hundreds of millions of research dollars are wasted each year recruiting more participants than required for RCTs.

Download Full-text

"P Value" or "Number Needed to Treat": Which One Is Better for Evaluating Clinical Treatments’ Benefits?

International journal of basic science in medicine ◽

10.34172/ijbsm.2021.13 ◽

2021 ◽

Vol 6 (3) ◽

pp. 74-75

Author(s):

Soudabeh Hamedi-Shahraki ◽

Farshad Amirkhizi

Keyword(s):

Clinical Significance ◽

Effect Size ◽

Statistical Significance ◽

Clinical Effectiveness ◽

P Value ◽

Number Needed To Treat ◽

Effect Size Measure ◽

Size Measure ◽

Therapeutic Measures

Statistical significance does not necessarily mean clinical significance. A P value less than 0.05 does not guarantee the clinical effectiveness of a treatment. To assess the clinical valuable of a treatment, the effect size must be calculated. The number needed to treat (NNT) is an example of an effect size measure that can be very helpful in determining the clinical significance of a treatment. Therefore, it is recommended for all researchers and physicians to look beyond the P value and calculate the NNT for assessing the clinical significance of therapeutic measures and agents.

Download Full-text

Commentary: Statistical significance and clinical significance - A call to consider patient reported outcome measures, effect size, confidence interval and minimal clinically important difference (MCID)

Journal of Bodywork and Movement Therapies ◽

10.1016/j.jbmt.2019.02.009 ◽

2019 ◽

Vol 23 (4) ◽

pp. 690-694 ◽

Cited By ~ 2

Author(s):

Michael Fleischmann ◽

Brett Vaughan

Keyword(s):

Confidence Interval ◽

Clinical Significance ◽

Outcome Measures ◽

Effect Size ◽

Minimal Clinically Important Difference ◽

Statistical Significance ◽

Patient Reported Outcome Measures ◽

Patient Reported Outcome ◽

Clinically Important Difference ◽

Patient Reported

Download Full-text