Correction to Ön the Best Choice of Sample Sizes for A t-Test When the Ratio of Variances is Known"

Abstract Background A small sample size (n < 30 for each treatment group) is usually enrolled to investigate the differences in efficacy between treatments for knee osteoarthritis (OA). The objective of this study was to use simulation for comparing the power of four statistical methods for analysis of small sample size for detecting the differences in efficacy between two treatments for knee OA. Methods A total of 10,000 replicates of 5 sample sizes (n=10, 15, 20, 25, and 30 for each group) were generated based on the previous reported measures of treatment efficacy. Four statistical methods were used to compare the differences in efficacy between treatments, including the two-sample t-test (t-test), the Mann-Whitney U-test (M-W test), the Kolmogorov-Smirnov test (K-S test), and the permutation test (perm-test). Results The bias of simulated parameter means showed a decreased trend with sample size but the CV% of simulated parameter means varied with sample sizes for all parameters. For the largest sample size (n=30), the CV% could achieve a small level (<20%) for almost all parameters but the bias could not. Among the non-parametric tests for analysis of small sample size, the perm-test had the highest statistical power, and its false positive rate was not affected by sample size. However, the power of the perm-test could not achieve a high value (80%) even using the largest sample size (n=30). Conclusion The perm-test is suggested for analysis of small sample size to compare the differences in efficacy between two treatments for knee OA.

Download Full-text

Validation of automatic passenger counting: introducing the t-test-induced equivalence test

Transportation ◽

10.1007/s11116-019-09991-9 ◽

2019 ◽

Vol 47 (6) ◽

pp. 3031-3045

Author(s):

Michael Siebert ◽

David Ellenberger

Keyword(s):

Measurement Errors ◽

T Test ◽

Statistical Hypothesis ◽

Type I ◽

Equivalence Testing ◽

Sample Sizes ◽

Equivalence Test ◽

Stable Equivalence ◽

Student’S T ◽

Type Ii Errors

Abstract Automatic passenger counting (APC) in public transport has been introduced in the 1970s and has been rapidly emerging in recent years. Still, real-world applications continue to face events that are difficult to classify. The induced imprecision needs to be handled as statistical noise and thus methods have been defined to ensure that measurement errors do not exceed certain bounds. Various recommendations for such an APC validation have been made to establish criteria that limit the bias and the variability of the measurement errors. In those works, the misinterpretation of non-significance in statistical hypothesis tests for the detection of differences (e.g. Student’s t-test) proves to be prevalent, although existing methods which were developed under the term equivalence testing in biostatistics (i.e. bioequivalence trials, Schuirmann in J Pharmacokinet Pharmacodyn 15(6):657–680, 1987) would be appropriate instead. This heavily affects the calibration and validation process of APC systems and has been the reason for unexpected results when the sample sizes were not suitably chosen: Large sample sizes were assumed to improve the assessment of systematic measurement errors of the devices from a user’s perspective as well as from a manufacturers perspective, but the regular t-test fails to achieve that. We introduce a variant of the t-test, the revised t-test, which addresses both type I and type II errors appropriately and allows a comprehensible transition from the long-established t-test in a widely used industrial recommendation. This test is appealing, but still it is susceptible to numerical instability. Finally, we analytically reformulate it as a numerically stable equivalence test, which is thus easier to use. Our results therefore allow to induce an equivalence test from a t-test and increase the comparability of both tests, especially for decision makers.

Download Full-text

Correction to “on the Best Choice of Sample Sizes for at-Test When the Ratio of Variances is Known”

Journal of the American Statistical Association ◽

10.1080/01621459.1950.10483339 ◽

1950 ◽

Vol 45 (249) ◽

pp. 111-111

Author(s):

John E. Walsh

Keyword(s):

Sample Sizes ◽

Ratio Of Variances

Download Full-text

Type I Error Rates for Welch’s Test and James’s Second-Order Test Under Nonnormality and Inequality of Variance When There Are Two Groups

Journal of Educational Statistics ◽

10.3102/10769986019003275 ◽

1994 ◽

Vol 19 (3) ◽

pp. 275-291 ◽

Cited By ~ 28

Author(s):

James Algina ◽

T. C. Oshima ◽

Wen-Ying Lin

Keyword(s):

Degrees Of Freedom ◽

Type I Error ◽

Total Sample ◽

Error Rates ◽

Second Order ◽

T Test ◽

Type I ◽

Sample Sizes ◽

Unequal Variances ◽

Type I Error Rates

Type I error rates were estimated for three tests that compare means by using data from two independent samples: the independent samples t test, Welch’s approximate degrees of freedom test, and James’s second-order test. Type I error rates were estimated for skewed distributions, equal and unequal variances, equal and unequal sample sizes, and a range of total sample sizes. Welch’s test and James’s test have very similar Type I error rates and tend to control the Type I error rate as well or better than the independent samples t test does. The results provide guidance about the total sample sizes required for controlling Type I error rates.

Download Full-text

Normality and Sample Size Do Not Matter for the Selection of an Appropriate Statistical Test for Two-Group Comparisons

Methodology ◽

10.1027/1614-2241/a000110 ◽

2016 ◽

Vol 12 (2) ◽

pp. 61-71 ◽

Cited By ~ 14

Author(s):

Antoine Poncet ◽

Delphine S. Courvoisier ◽

Christophe Combescure ◽

Thomas V. Perneger

Keyword(s):

Sample Size ◽

Test Performance ◽

Permutation Test ◽

Bimodal Distribution ◽

Error Rates ◽

T Test ◽

Sample Sizes ◽

Whitney Test ◽

Mann Whitney Test ◽

Selection Of

Abstract. Many applied researchers are taught to use the t-test when distributions appear normal and/or sample sizes are large and non-parametric tests otherwise, and fear inflated error rates if the “wrong” test is used. In a simulation study (four tests: t-test, Mann-Whitney test, Robust t-test, Permutation test; seven sample sizes between 2 × 10 and 2 × 500; four distributions: normal, uniform, log-normal, bimodal; under the null and alternate hypotheses), we show that type 1 errors are well controlled in all conditions. The t-test is most powerful under the normal and the uniform distributions, the Mann-Whitney test under the lognormal distribution, and the robust t-test under the bimodal distribution. Importantly, even the t-test was more powerful under asymmetric distributions than under the normal distribution for the same effect size. It appears that normality and sample size do not matter for the selection of a test to compare two groups of same size and variance. The researcher can opt for the test that fits the scientific hypothesis the best, without fear of poor test performance.

Download Full-text

A Comparison of the Power of Wilcoxon's Rank-Sum Statistic to that of Student'st Statistic Under Various Nonnormal Distributions

Journal of Educational Statistics ◽

10.3102/10769986005004309 ◽

1980 ◽

Vol 5 (4) ◽

pp. 309-335 ◽

Cited By ~ 20

Author(s):

R. Clifford Blair ◽

James J. Higgins

Keyword(s):

Monte Carlo ◽

T Test ◽

Relative Power ◽

Sample Sizes ◽

Large Power ◽

Nonnormal Distributions ◽

Monte Carlo Techniques ◽

Uniform Distributions

Computer generated Monte Carlo techniques were used to compare the power of Wilcoxon's rank-sum test to the power of the two independent means t test for situations in which samples were drawn from (1) uniform, (2) Laplace, (3) half-normal, (4) exponential, (5) mixed-normal, and (6) mixed-uniform distributions. Sample sizes studied were ( n1, n2) = (3,9), (6,6), (9,27), (18,18), (27,81), and (54,54). It was concluded that (1) generally speaking, the Wilcoxon statistic held very large power advantages over the t statistic, (2) asymptotic relative efficiencies were reasonably good indicators of the relative power of the two statistics, (3) results obtained from smaller samples were often markedly different from the results obtained from larger samples, and (4) because of the narrow ranges of population shapes and sample sizes investigated in some widely cited previous studies of this type, the conclusions reached in those studies must now be deemed questionable.

Download Full-text

Revisit the Two Sample t-Test with a Known Ratio of Variances

Open Journal of Statistics ◽

10.4236/ojs.2011.13018 ◽

2011 ◽

Vol 01 (03) ◽

pp. 151-156 ◽

Cited By ~ 3

Author(s):

Yongxiu She ◽

Augustine Wong ◽

Xiaofeng Zhou

Keyword(s):

T Test ◽

Ratio Of Variances

Download Full-text

Comparing t Test, rit Significance Test, and rit Criteria for Item Selection Method: A Simulation Study

ANIMA Indonesian Psychological Journal ◽

10.24123/aipj.v32i2.588 ◽

2017 ◽

Vol 32 (2) ◽

pp. 99-108

Author(s):

Agung Santoso

Keyword(s):

Empirical Evidence ◽

Simulation Study ◽

B Value ◽

Significance Test ◽

Selection Method ◽

T Test ◽

Item Selection ◽

Sample Sizes ◽

Total Correlation ◽

Future Practice

Three criteria of items selection have been widely used despite of their limitations without any empirical evidence to support its practice. Current study examined the three criteria to determine which of the three criteria were the best among the others. Those criteria were the item total correlation, its significance by t-test and significance of rit. Simulations were conducted to demonstrate which of the three criteria provided the least errors in both excluding good items and including bad items in the scale. The author manipulate four conditions in conducting simulation study: (a) number of items in a scale; (b) value of rit in population; (c) sample sizes; and (d) criteria in including or excluding items in a scale. The results showed that criteria of rit > .30 provided the least errors of including bad items and excluding good items, particularly when n > 200. The two criteria based on significance test provided the largest errors therefore were not recommended in future practice.

Download Full-text