Type I Error Rates for Yao’s and James Tests of Equality of Mean Vectors Under VarianceCovariance Heteroscedasticity

1988 ◽  
Vol 13 (3) ◽  
pp. 281-290 ◽  
Author(s):  
James Algina ◽  
Kezhen L. Tang

For Yao’s and James’ tests, Type I error rates were estimated for various combinations of the number of variables (p), samplesize ratio (n1: n2), sample-size-to-variables ratio, and degree of heteroscedasticity. These tests are alternatives to Hotelling’s T2 and are intended for use when the variance-covariance matrices are not equal in a study using two independent samples. The performance of Yao’s test was superior to that of James’. Yao’s test had appropriate Type I error rates when p ≥ 10, (n1 + n2)/p ≥ 10, and 1:2 ≤ n1:n2 ≤ 2:1. When (n1 + n2)/p = 20, Yao’s test was robust when n1: n2 was 5:1, 3:1, and 4:1 and p was 2, 6, and 10, respectively.

1991 ◽  
Vol 16 (2) ◽  
pp. 125-139 ◽  
Author(s):  
James Algina ◽  
Takako C. Oshima ◽  
K. Linda Tang

Type I error rates for Yao’s, James’ first order, James’ second order, and Johansen’s tests of equality of mean vectors for two independent samples were estimated for various conditions defined by the degree of heteroscedasticity and nonnormality (uniform, Laplace, t(5), beta (5, 1.5), exponential, and lognormal distributions). For these alternatives to Hotelling’s T2, variance-covariance homogeneity is not an assumption. Although the four procedures can be seriously nonrobust with exponential and lognormal distributions, they were fairly robust with the rest of the distributions. The performance of Yao’s test, James’ second order test, and Johansen’s test was slightly superior to the performance of James’ first order test.


Methodology ◽  
2009 ◽  
Vol 5 (2) ◽  
pp. 60-70 ◽  
Author(s):  
W. Holmes Finch ◽  
Teresa Davenport

Permutation testing has been suggested as an alternative to the standard F approximate tests used in multivariate analysis of variance (MANOVA). These approximate tests, such as Wilks’ Lambda and Pillai’s Trace, have been shown to perform poorly when assumptions of normally distributed dependent variables and homogeneity of group covariance matrices were violated. Because Monte Carlo permutation tests do not rely on distributional assumptions, they may be expected to work better than their approximate cousins when the data do not conform to the assumptions described above. The current simulation study compared the performance of four standard MANOVA test statistics with their Monte Carlo permutation-based counterparts under a variety of conditions with small samples, including conditions when the assumptions were met and when they were not. Results suggest that for sample sizes of 50 subjects, power is very low for all the statistics. In addition, Type I error rates for both the approximate F and Monte Carlo tests were inflated under the condition of nonnormal data and unequal covariance matrices. In general, the performance of the Monte Carlo permutation tests was slightly better in terms of Type I error rates and power when both assumptions of normality and homogeneous covariance matrices were not met. It should be noted that these simulations were based upon the case with three groups only, and as such results presented in this study can only be generalized to similar situations.


2019 ◽  
Vol 3 (Supplement_1) ◽  
Author(s):  
Keisuke Ejima ◽  
Andrew Brown ◽  
Daniel Smith ◽  
Ufuk Beyaztas ◽  
David Allison

Abstract Objectives Rigor, reproducibility and transparency (RRT) awareness has expanded over the last decade. Although RRT can be improved from various aspects, we focused on type I error rates and power of commonly used statistical analyses testing mean differences of two groups, using small (n ≤ 5) to moderate sample sizes. Methods We compared data from five distinct, homozygous, monogenic, murine models of obesity with non-mutant controls of both sexes. Baseline weight (7–11 weeks old) was the outcome. To examine whether type I error rate could be affected by choice of statistical tests, we adjusted the empirical distributions of weights to ensure the null hypothesis (i.e., no mean difference) in two ways: Case 1) center both weight distributions on the same mean weight; Case 2) combine data from control and mutant groups into one distribution. From these cases, 3 to 20 mice were resampled to create a ‘plasmode’ dataset. We performed five common tests (Student's t-test, Welch's t-test, Wilcoxon test, permutation test and bootstrap test) on the plasmodes and computed type I error rates. Power was assessed using plasmodes, where the distribution of the control group was shifted by adding a constant value as in Case 1, but to realize nominal effect sizes. Results Type I error rates were unreasonably higher than the nominal significance level (type I error rate inflation) for Student's t-test, Welch's t-test and permutation especially when sample size was small for Case 1, whereas inflation was observed only for permutation for Case 2. Deflation was noted for bootstrap with small sample. Increasing sample size mitigated inflation and deflation, except for Wilcoxon in Case 1 because heterogeneity of weight distributions between groups violated assumptions for the purposes of testing mean differences. For power, a departure from the reference value was observed with small samples. Compared with the other tests, bootstrap was underpowered with small samples as a tradeoff for maintaining type I error rates. Conclusions With small samples (n ≤ 5), bootstrap avoided type I error rate inflation, but often at the cost of lower power. To avoid type I error rate inflation for other tests, sample size should be increased. Wilcoxon should be avoided because of heterogeneity of weight distributions between mutant and control mice. Funding Sources This study was supported in part by NIH and Japan Society for Promotion of Science (JSPS) KAKENHI grant.


2015 ◽  
Vol 9 (13) ◽  
pp. 1
Author(s):  
Tobi Kingsley Ochuko ◽  
Suhaida Abdullah ◽  
Zakiyah Binti Zain ◽  
Sharipah Soaad Syed Yahaya

<p class="zhengwen"><span lang="EN-GB">This study centres on the comparison of independent group tests in terms of power, by using parametric method, such</span><span lang="EN-GB"> as the Alexander-Govern test. The Alexander-Govern (<em>AG</em>) test uses mean as its central tendency measure. It is a better alternative compared to the Welch test, the James test and the <em>ANOVA</em>, because it produces high power and gives good control of Type I error rates for a normal data under variance heterogeneity. But this test is not robust for a non-normal data. When trimmed mean was applied on the test as its central tendency measure under non-normality, the test was only robust for two group condition, but as the number of groups increased more than two groups, the test was no more robust. As a result, a highly robust estimator known as the <em>MOM</em> estimator was applied on the test, as its central tendency measure. This test is not affected by the number of groups, but could not control Type I error rates under skewed heavy tailed distribution. In this study, the Winsorized <em>MOM</em> estimator was applied in the <em>AG</em> test, as its central tendency measure. A simulation of 5,000 data sets were generated and analysed on the test, using the <em>SAS</em> package. The result of the analysis, shows that with the pairing of unbalanced sample size of (15:15:20:30) with equal variance of (1:1:1:1) and the pairing of unbalanced sample size of (15:15:20:30) with unequal variance of (1:1:1:36) with effect size index (<em>f</em> = 0.8), the <em>AGWMOM </em>test only produced a high power value of 0.9562 and 0.8336 compared to the <em>AG </em>test, the <em>AGMOM </em>test and the <em>ANOVA </em>respectively and the test is considered to be sufficient.</span></p>


1996 ◽  
Vol 21 (2) ◽  
pp. 169-178 ◽  
Author(s):  
William T. Coombs ◽  
James Algina

Type I error rates for the Johansen test were estimated using simulated data for a variety of conditions. The design of the experiment was a 2 × 2× 2× 3× 9× 3 factorial. The factors were (a) type of distribution, (b) number of dependent variables, (c) number of groups, (d) ratio of the smallest sample size to the number of dependent variables, (e) sample size ratios, and (f) degree of heteroscedasticity. The results indicate that Type I error rates for the Johansen test depend heavily on the number of groups and the ratio of the smallest sample size to the number of dependent variables. Type I error rates depend to a lesser extent on the distribution types used in the study. Based on the results, sample size guidelines are presented.


2020 ◽  
Author(s):  
Keith Lohse ◽  
Kristin Sainani ◽  
J. Andrew Taylor ◽  
Michael Lloyd Butson ◽  
Emma Knight ◽  
...  

Magnitude-based inference (MBI) is a controversial statistical method that has been used in hundreds of papers in sports science despite criticism from statisticians. To better understand how this method has been applied in practice, we systematically reviewed 232 papers that used MBI. We extracted data on study design, sample size, and choice of MBI settings and parameters. Median sample size was 10 per group (interquartile range, IQR: 8 – 15) for multi-group studies and 14 (IQR: 10 – 24) for single-group studies; few studies reported a priori sample size calculations (15%). Authors predominantly applied MBI’s default settings and chose “mechanistic/non-clinical” rather than “clinical” MBI even when testing clinical interventions (only 14 studies out of 232 used clinical MBI). Using these data, we can estimate the Type I error rates for the typical MBI study. Authors frequently made dichotomous claims about effects based on the MBI criterion of a “likely” effect and sometimes based on the MBI criterion of a “possible” effect. When the sample size is n=8 to 15 per group, these inferences have Type I error rates of 12%-22% and 22%-45%, respectively. High Type I error rates were compounded by multiple testing: Authors reported results from a median of 30 tests related to outcomes; and few studies specified a primary outcome (14%). We conclude that MBI has promoted small studies, promulgated a “black box” approach to statistics, and led to numerous papers where the conclusions are not supported by the data. Amidst debates over the role of p-values and significance testing in science, MBI also provides an important natural experiment: we find no evidence that moving researchers away from p-values or null hypothesis significance testing makes them less prone to dichotomization or over-interpretation of findings.


2008 ◽  
Vol 32 (1) ◽  
pp. 157-166 ◽  
Author(s):  
Roberta Bessa Veloso Silva ◽  
Daniel Furtado Ferreira ◽  
Denismar Alves Nogueira

The present work emphasizes the importance of testing hypothesis on homogeneity of covariance matrices from multivariate k populations. The violation of the assumption of the homogeneity of covariance matrices affects the performance of the tests and the coverage probability of the confidence regions. This work intends to apply two tests of homogeneity of covariance and to evaluate type I error rates and power using Monte Carlo simulation in normal populations and robustness in non normal populations. Multivariate Bartlett's test (MBT) and its bootstrap version (MBTB) were used. Different configurations are tested combining sample sizes, number of variates, correlation and number of populations. Results show that the bootstrap test was considered superior to the asymptotic test and robust, since it controls the type error I rate.


Sign in / Sign up

Export Citation Format

Share Document