Understanding Statistical and Clinical Significance: Hypothesis Testing

1998 ◽  
Vol 11 (3) ◽  
pp. 181-195 ◽  
Author(s):  
H. Glenn Anderson ◽  
Michael G. Kendrach ◽  
Shana Trice

This primer reviews a number of statistical concepts integral to the hypothesis testing process and its role in decision making. Concepts of variables, scales of measure, and measures of central tendency and dispersion are discussed, and a 5-step process of hypothesis testing is presented. Finally, a discussion of the statistical and clinical significance of research results is presented, along with the concept of confidence intervals as a method of conveying information about the effect size as well as the statistical significance of a difference between groups.

2000 ◽  
Vol 86 (1) ◽  
pp. 243-259 ◽  
Author(s):  
Sumiko Takayanagi ◽  
Norman Cliff

The present study examined how statistical significance levels are treated and interpreted by graduate students who use hypothesis-testing in their scientific investigation. To test underlying psychological aspects of hypothesis-testing, the idea of fuzzy set theory was employed to identify the uncertain points in judgments. 34 graduate students in a psychology department made judgments about hypothetical statistical decisions. The results indicated that (1) the majority of these students treated significance levels on a continuum and rated them according to the magnitude of statistical significance; (2) the subjects shifted their decisions based on the types of hypothetical scenarios but not by the sample sizes; instead, they interpreted a smaller sample size as being less reliable. (3) The subjects frequently chose formally used statistical terms, e.g., Significant and Not Significant, more than graduated verbal expressions, e.g., Marginally Significant and Borderline Significant; and (4) the Fuzziness (degree of confidence in decision-making) was dependent on individuals and existed more in the critical points of transition where judgments are most difficult. The Fuzziness Index illustrated the subtle shifts of human decision-making patterns in statistical judgments. Underlying decision uncertainties and difficulties can be illustrated by functions generated from fuzzy set theory, which may more closely resemble human psychological mechanism. This integrative study of fuzzy set theory and behavioral measurements appears to provide a technique that is more natural for examining and understanding imprecise boundaries of human decisions.


2016 ◽  
Vol 77 (4) ◽  
pp. 673-689 ◽  
Author(s):  
Rand R. Wilcox ◽  
Sarfaraz Serang

The article provides perspectives on p values, null hypothesis testing, and alternative techniques in light of modern robust statistical methods. Null hypothesis testing and p values can provide useful information provided they are interpreted in a sound manner, which includes taking into account insights and advances that have occurred during the past 50 years. There are, of course, limitations to what null hypothesis testing and p values reveal about data. But modern advances make it clear that there are serious limitations and concerns associated with conventional confidence intervals, standard Bayesian methods, and commonly used measures of effect size. Many of these concerns can be addressed using modern robust methods.


2020 ◽  
Vol 36 ◽  
Author(s):  
Nadine M. Neumann ◽  
Alexandre Plastino ◽  
Jony A. Pinto Junior ◽  
Alex A. Freitas

Abstract Statistical significance analysis, based on hypothesis tests, is a common approach for comparing classifiers. However, many studies oversimplify this analysis by simply checking the condition p-value < 0.05, ignoring important concepts such as the effect size and the statistical power of the test. This problem is so worrying that the American Statistical Association has taken a strong stand on the subject, noting that although the p-value is a useful statistical measure, it has been abusively used and misinterpreted. This work highlights problems caused by the misuse of hypothesis tests and shows how the effect size and the power of the test can provide important information for better decision-making. To investigate these issues, we perform empirical studies with different classifiers and 50 datasets, using the Student’s t-test and the Wilcoxon test to compare classifiers. The results show that an isolated p-value analysis can lead to wrong conclusions and that the evaluation of the effect size and the power of the test contributes to a more principled decision-making.


2016 ◽  
Vol 51 (12) ◽  
pp. 1045-1048 ◽  
Author(s):  
Monica Lininger ◽  
Bryan L. Riemann

Objective: To describe confidence intervals (CIs) and effect sizes and provide practical examples to assist clinicians in assessing clinical meaningfulness. Background: As discussed in our first article in 2015, which addressed the difference between statistical significance and clinical meaningfulness, evaluating the clinical meaningfulness of a research study remains a challenge to many readers. In this paper, we will build on this topic by examining CIs and effect sizes. Description: A CI is a range estimated from sample data (the data we collect) that is likely to include the population parameter (value) of interest. Conceptually, this constitutes the lower and upper limits of the sample data, which would likely include, for example, the mean from the unknown population. An effect size is the magnitude of difference between 2 means. When a statistically significant difference exists between 2 means, effect size is used to describe how large or small that difference actually is. Confidence intervals and effect sizes enhance the practical interpretation of research results. Recommendations: Along with statistical significance, the CI and effect size can assist practitioners in better understanding the clinical meaningfulness of a research study.


F1000Research ◽  
2018 ◽  
Vol 7 ◽  
pp. 1176 ◽  
Author(s):  
Nicholas Graves ◽  
Adrian G. Barnett ◽  
Edward Burn ◽  
David Cook

Background: Clinical trials might be larger than needed because arbitrary high levels of statistical confidence are sought in the results. Traditional sample size calculations ignore the marginal value of the information collected for decision making. The statistical hypothesis testing objective is misaligned with the goal of generating information necessary for decision-making. The aim of the present study was to show that a clinical trial designed to test a prior hypothesis against an arbitrary threshold of confidence may recruit too many participants, wasting scarce research dollars and exposing participants to research unnecessarily. Methods: We used data from a recent RCT powered for traditional rules of statistical significance. The data were also used for an economic analysis to show the intervention led to cost savings and improved health outcomes. Adoption represented a good investment for decision-makers. We examined the effect of reducing the trial’s sample size on the results of the statistical hypothesis-testing analysis and the conclusions that would be drawn by decision-makers reading the economic analysis. Results: As the sample size reduced it became more likely that the null hypothesis of no difference in the primary outcome between groups would fail to be rejected. For decision-makers reading the economic analysis, reducing the sample size had little effect on the conclusion about whether to adopt the intervention. There was always high probability the intervention reduced costs and improved health. Conclusions: Decision makers managing health services are largely invariant to the sample size of the primary trial and the arbitrary p-value of 0.05. If the goal is to make a good decision about whether the intervention should be adopted widely, then that could have been achieved with a much smaller trial. It is plausible that hundreds of millions of research dollars are wasted each year recruiting more participants than required for RCTs.


2021 ◽  
Vol 6 (3) ◽  
pp. 74-75
Author(s):  
Soudabeh Hamedi-Shahraki ◽  
Farshad Amirkhizi

Statistical significance does not necessarily mean clinical significance. A P value less than 0.05 does not guarantee the clinical effectiveness of a treatment. To assess the clinical valuable of a treatment, the effect size must be calculated. The number needed to treat (NNT) is an example of an effect size measure that can be very helpful in determining the clinical significance of a treatment. Therefore, it is recommended for all researchers and physicians to look beyond the P value and calculate the NNT for assessing the clinical significance of therapeutic measures and agents.


Sign in / Sign up

Export Citation Format

Share Document