Correction to Ön the Best Choice of Sample Sizes for A t-Test When the Ratio of Variances is Known"

1950 ◽  
Vol 45 (249) ◽  
pp. 111
Author(s):  
John E. Walsh
2020 ◽  
Author(s):  
Chia-Lung Shih ◽  
Te-Yu Hung

Abstract Background A small sample size (n < 30 for each treatment group) is usually enrolled to investigate the differences in efficacy between treatments for knee osteoarthritis (OA). The objective of this study was to use simulation for comparing the power of four statistical methods for analysis of small sample size for detecting the differences in efficacy between two treatments for knee OA. Methods A total of 10,000 replicates of 5 sample sizes (n=10, 15, 20, 25, and 30 for each group) were generated based on the previous reported measures of treatment efficacy. Four statistical methods were used to compare the differences in efficacy between treatments, including the two-sample t-test (t-test), the Mann-Whitney U-test (M-W test), the Kolmogorov-Smirnov test (K-S test), and the permutation test (perm-test). Results The bias of simulated parameter means showed a decreased trend with sample size but the CV% of simulated parameter means varied with sample sizes for all parameters. For the largest sample size (n=30), the CV% could achieve a small level (<20%) for almost all parameters but the bias could not. Among the non-parametric tests for analysis of small sample size, the perm-test had the highest statistical power, and its false positive rate was not affected by sample size. However, the power of the perm-test could not achieve a high value (80%) even using the largest sample size (n=30). Conclusion The perm-test is suggested for analysis of small sample size to compare the differences in efficacy between two treatments for knee OA.


2019 ◽  
Vol 47 (6) ◽  
pp. 3031-3045
Author(s):  
Michael Siebert ◽  
David Ellenberger

Abstract Automatic passenger counting (APC) in public transport has been introduced in the 1970s and has been rapidly emerging in recent years. Still, real-world applications continue to face events that are difficult to classify. The induced imprecision needs to be handled as statistical noise and thus methods have been defined to ensure that measurement errors do not exceed certain bounds. Various recommendations for such an APC validation have been made to establish criteria that limit the bias and the variability of the measurement errors. In those works, the misinterpretation of non-significance in statistical hypothesis tests for the detection of differences (e.g. Student’s t-test) proves to be prevalent, although existing methods which were developed under the term equivalence testing in biostatistics (i.e. bioequivalence trials, Schuirmann in J Pharmacokinet Pharmacodyn 15(6):657–680, 1987) would be appropriate instead. This heavily affects the calibration and validation process of APC systems and has been the reason for unexpected results when the sample sizes were not suitably chosen: Large sample sizes were assumed to improve the assessment of systematic measurement errors of the devices from a user’s perspective as well as from a manufacturers perspective, but the regular t-test fails to achieve that. We introduce a variant of the t-test, the revised t-test, which addresses both type I and type II errors appropriately and allows a comprehensible transition from the long-established t-test in a widely used industrial recommendation. This test is appealing, but still it is susceptible to numerical instability. Finally, we analytically reformulate it as a numerically stable equivalence test, which is thus easier to use. Our results therefore allow to induce an equivalence test from a t-test and increase the comparability of both tests, especially for decision makers.


1994 ◽  
Vol 19 (3) ◽  
pp. 275-291 ◽  
Author(s):  
James Algina ◽  
T. C. Oshima ◽  
Wen-Ying Lin

Type I error rates were estimated for three tests that compare means by using data from two independent samples: the independent samples t test, Welch’s approximate degrees of freedom test, and James’s second-order test. Type I error rates were estimated for skewed distributions, equal and unequal variances, equal and unequal sample sizes, and a range of total sample sizes. Welch’s test and James’s test have very similar Type I error rates and tend to control the Type I error rate as well or better than the independent samples t test does. The results provide guidance about the total sample sizes required for controlling Type I error rates.


Methodology ◽  
2016 ◽  
Vol 12 (2) ◽  
pp. 61-71 ◽  
Author(s):  
Antoine Poncet ◽  
Delphine S. Courvoisier ◽  
Christophe Combescure ◽  
Thomas V. Perneger

Abstract. Many applied researchers are taught to use the t-test when distributions appear normal and/or sample sizes are large and non-parametric tests otherwise, and fear inflated error rates if the “wrong” test is used. In a simulation study (four tests: t-test, Mann-Whitney test, Robust t-test, Permutation test; seven sample sizes between 2 × 10 and 2 × 500; four distributions: normal, uniform, log-normal, bimodal; under the null and alternate hypotheses), we show that type 1 errors are well controlled in all conditions. The t-test is most powerful under the normal and the uniform distributions, the Mann-Whitney test under the lognormal distribution, and the robust t-test under the bimodal distribution. Importantly, even the t-test was more powerful under asymmetric distributions than under the normal distribution for the same effect size. It appears that normality and sample size do not matter for the selection of a test to compare two groups of same size and variance. The researcher can opt for the test that fits the scientific hypothesis the best, without fear of poor test performance.


1980 ◽  
Vol 5 (4) ◽  
pp. 309-335 ◽  
Author(s):  
R. Clifford Blair ◽  
James J. Higgins

Computer generated Monte Carlo techniques were used to compare the power of Wilcoxon's rank-sum test to the power of the two independent means t test for situations in which samples were drawn from (1) uniform, (2) Laplace, (3) half-normal, (4) exponential, (5) mixed-normal, and (6) mixed-uniform distributions. Sample sizes studied were ( n1, n2) = (3,9), (6,6), (9,27), (18,18), (27,81), and (54,54). It was concluded that (1) generally speaking, the Wilcoxon statistic held very large power advantages over the t statistic, (2) asymptotic relative efficiencies were reasonably good indicators of the relative power of the two statistics, (3) results obtained from smaller samples were often markedly different from the results obtained from larger samples, and (4) because of the narrow ranges of population shapes and sample sizes investigated in some widely cited previous studies of this type, the conclusions reached in those studies must now be deemed questionable.


2011 ◽  
Vol 01 (03) ◽  
pp. 151-156 ◽  
Author(s):  
Yongxiu She ◽  
Augustine Wong ◽  
Xiaofeng Zhou
Keyword(s):  
T Test ◽  

2017 ◽  
Vol 32 (2) ◽  
pp. 99-108
Author(s):  
Agung Santoso

Three criteria of items selection have been widely used despite of their limitations without any empirical evidence to support its practice. Current study examined the three criteria to determine which of the three criteria were the best among the others. Those criteria were the item total correlation, its significance by t-test and significance of rit. Simulations were conducted to demonstrate which of the three criteria provided the least errors in both excluding good items and including bad items in the scale. The author manipulate four conditions in conducting simulation study: (a) number of items in a scale; (b) value of rit in population; (c) sample sizes; and (d) criteria in including or excluding items in a scale. The results showed that criteria of rit > .30 provided the least errors of including bad items and excluding good items, particularly when n > 200. The two criteria based on significance test provided the largest errors therefore were not recommended in future practice.


Sign in / Sign up

Export Citation Format

Share Document