Statistical Power and Effect Size in Informative Retrieval Experiments

Author(s):  
Michael J. Nelson

Statistical tests are used in information retrieval to test various hypotheses such as which indexing method is better or which retrieval system is better. Sometimes when using these statistical tests there is not enough evidence to reject the null hypothesis. Then either we have correctly discovered a true null hypothesis or made a type II error (probability denoted by b) and falsely accepted a . . .

2019 ◽  
Vol 2 (3) ◽  
pp. 199-213 ◽  
Author(s):  
Marc-André Goulet ◽  
Denis Cousineau

When running statistical tests, researchers can commit a Type II error, that is, fail to reject the null hypothesis when it is false. To diminish the probability of committing a Type II error (β), statistical power must be augmented. Typically, this is done by increasing sample size, as more participants provide more power. When the estimated effect size is small, however, the sample size required to achieve sufficient statistical power can be prohibitive. To alleviate this lack of power, a common practice is to measure participants multiple times under the same condition. Here, we show how to estimate statistical power by taking into account the benefit of such replicated measures. To that end, two additional parameters are required: the correlation between the multiple measures within a given condition and the number of times the measure is replicated. An analysis of a sample of 15 studies (total of 298 participants and 38,404 measurements) suggests that in simple cognitive tasks, the correlation between multiple measures is approximately .14. Although multiple measurements increase statistical power, this effect is not linear, but reaches a plateau past 20 to 50 replications (depending on the correlation). Hence, multiple measurements do not replace the added population representativeness provided by additional participants.


1996 ◽  
Vol 1 (1) ◽  
pp. 25-28 ◽  
Author(s):  
Martin A. Weinstock

Background: Accurate understanding of certain basic statistical terms and principles is key to critical appraisal of published literature. Objective: This review describes type I error, type II error, null hypothesis, p value, statistical significance, a, two-tailed and one-tailed tests, effect size, alternate hypothesis, statistical power, β, publication bias, confidence interval, standard error, and standard deviation, while including examples from reports of dermatologic studies. Conclusion: The application of the results of published studies to individual patients should be informed by an understanding of certain basic statistical concepts.


1991 ◽  
Vol 42 (5) ◽  
pp. 555 ◽  
Author(s):  
PG Fairweather

This paper discusses, from a philosophical perspective, the reasons for considering the power of any statistical test used in environmental biomonitoring. Power is inversely related to the probability of making a Type II error (i.e. low power indicates a high probability of Type II error). In the context of environmental monitoring, a Type II error is made when it is concluded that no environmental impact has occurred even though one has. Type II errors have been ignored relative to Type I errors (the mistake of concluding that there is an impact when one has not occurred), the rates of which are stipulated by the a values of the test. In contrast, power depends on the value of α, the sample size used in the test, the effect size to be detected, and the variability inherent in the data. Although power ideas have been known for years, only recently have these issues attracted the attention of ecologists and have methods been available for calculating power easily. Understanding statistical power gives three ways to improve environmental monitoring and to inform decisions about actions arising from monitoring. First, it allows the most sensitive tests to be chosen from among those applicable to the data. Second, preliminary power analysis can be used to indicate the sample sizes necessary to detect an environmental change. Third, power analysis should be used after any nonsignificant result is obtained in order to judge whether that result can be interpreted with confidence or the test was too weak to examine the null hypothesis properly. Power procedures are concerned with the statistical significance of tests of the null hypothesis, and they lend little insight, on their own, into the workings of nature. Power analyses are, however, essential to designing sensitive tests and correctly interpreting their results. The biological or environmental significance of any result, including whether the impact is beneficial or harmful, is a separate issue. The most compelling reason for considering power is that Type II errors can be more costly than Type I errors for environmental management. This is because the commitment of time, energy and people to fighting a false alarm (a Type I error) may continue only in the short term until the mistake is discovered. In contrast, the cost of not doing something when in fact it should be done (a Type II error) will have both short- and long-term costs (e.g. ensuing environmental degradation and the eventual cost of its rectification). Low power can be disastrous for environmental monitoring programmes.


PEDIATRICS ◽  
1989 ◽  
Vol 83 (4) ◽  
pp. 634-634
Author(s):  
JOHN S. LOVERING

Dr. Mauro is obviously knowledgeable in the area of statistical analysis and raises a valid point regarding the importance of evaluating the likelihood of a type II error in studies with negative results. Although one does not wish to detract from the main point of a study with extensive details of the statistical analysis (two pages in this case), some readers may desire more mathematical information than values of mean, variance, t, and P, and do not wish to make their own calculations, to reassure themselves that a reasonable conclusion has been drawn by the authors and their statisticians.


Author(s):  
Zsuzsanna Győri

A cikkben a szerző a piac és a kormányzat kudarcaiból kiindulva azonosítja a közjó elérését célzó harmadik rendszer, az etikai felelősség kudarcait. Statisztikai analógiát használva elsőfajú kudarcként azonosítja, mikor az etikát nem veszik figyelembe, pedig szükség van rá. Ugyanakkor másodfajú kudarcként kezeli az etika profitnövelést célzó használatát, mely megtéveszti az érintetteteket, így még szélesebb utat enged az opportunista üzleti tevékenységnek. Meglátása szerint a három rendszer egymást nemcsak kiegészíti, de kölcsönösen korrigálja is. Ez az elsőfajú kudarc esetében általánosabb, a másodfajú kudarc megoldásához azonban a gazdasági élet alapvetéseinek átfogalmazására, az önérdek és az egydimenziós teljesítményértékelés helyett egy új, holisztikusabb szemléletű közgazdaságra van szükség. _______ In the article the author identifies the errors of ethical responsibility. That is the third system to attain common good, but have similar failures like the other two: the hands of the market and the government. Using statistical analogy the author identifies Type I error when ethics are not considered but it should be (null hypothesis is rejected however it’s true). She treats the usage of ethics to extend profit as Type II error. This misleads the stakeholders and makes room for opportunistic behaviour in business (null hypothesis is accepted in turn it’s false). In her opinion the three systems: the hand of the market, the government and the ethical management not only amend but interdependently correct each other. In the case of Type I error it is more general. Nevertheless to solve the Type II error we have to redefine the core principles of business. We need a more holistic approach in economics instead of self-interest and one-dimensional interpretation of value.


PEDIATRICS ◽  
1983 ◽  
Vol 71 (5) ◽  
pp. 867-867
Author(s):  
D. G. LEDUC ◽  
I. BARRY PLESS

In Reply.— In general, we agree with the criticism raised by Soman and by the Journal Club in Minneapolis. The "bottom line" of our paper is that there were no significant differences between the outcomes in the two groups. However, whenever a study involves relatively small samples the possibility of a type II error, due to lack of statistical power, must always be considered. The differences to which we called the readers' attention would have been statistically significant (with β set at .5) if the samples were as large as 375.


Sign in / Sign up

Export Citation Format

Share Document