Type I Error Rates for Yao’s and James Tests of Equality of Mean Vectors Under VarianceCovariance Heteroscedasticity

James Algina; Kezhen L. Tang

doi:10.3102/10769986013003281

Type I Error Rates for Yao’s and James Tests of Equality of Mean Vectors Under VarianceCovariance Heteroscedasticity

Journal of Educational Statistics ◽

10.3102/10769986013003281 ◽

1988 ◽

Vol 13 (3) ◽

pp. 281-290 ◽

Cited By ~ 4

Author(s):

James Algina ◽

Kezhen L. Tang

Keyword(s):

Sample Size ◽

Type I Error ◽

Error Rates ◽

Covariance Matrices ◽

Type I ◽

Type I Error Rates ◽

Mean Vectors ◽

Equality Of Mean Vectors

For Yao’s and James’ tests, Type I error rates were estimated for various combinations of the number of variables (p), samplesize ratio (n1: n2), sample-size-to-variables ratio, and degree of heteroscedasticity. These tests are alternatives to Hotelling’s T2 and are intended for use when the variance-covariance matrices are not equal in a study using two independent samples. The performance of Yao’s test was superior to that of James’. Yao’s test had appropriate Type I error rates when p ≥ 10, (n1 + n2)/p ≥ 10, and 1:2 ≤ n1:n2 ≤ 2:1. When (n1 + n2)/p = 20, Yao’s test was robust when n1: n2 was 5:1, 3:1, and 4:1 and p was 2, 6, and 10, respectively.

Download Full-text

Robustness of Yao’s, James’, and Johansen’s Tests Under Variance-Covariance Heteroscedasticity and Nonnormality

Journal of Educational Statistics ◽

10.3102/10769986016002125 ◽

1991 ◽

Vol 16 (2) ◽

pp. 125-139 ◽

Cited By ~ 4

Author(s):

James Algina ◽

Takako C. Oshima ◽

K. Linda Tang

Keyword(s):

Type I Error ◽

Error Rates ◽

Second Order ◽

Type I ◽

Lognormal Distributions ◽

First Order ◽

Type I Error Rates ◽

Mean Vectors ◽

Equality Of Mean Vectors

Type I error rates for Yao’s, James’ first order, James’ second order, and Johansen’s tests of equality of mean vectors for two independent samples were estimated for various conditions defined by the degree of heteroscedasticity and nonnormality (uniform, Laplace, t(5), beta (5, 1.5), exponential, and lognormal distributions). For these alternatives to Hotelling’s T2, variance-covariance homogeneity is not an assumption. Although the four procedures can be seriously nonrobust with exponential and lognormal distributions, they were fairly robust with the rest of the distributions. The performance of Yao’s test, James’ second order test, and Johansen’s test was slightly superior to the performance of James’ first order test.

Download Full-text

Type I Error Rates for Yao's and James' Tests of Equality of Mean Vectors under Variance-Covariance Heteroscedasticity

Journal of Educational Statistics ◽

10.2307/1164656 ◽

1988 ◽

Vol 13 (3) ◽

pp. 281 ◽

Cited By ~ 6

Author(s):

James Algina ◽

Kezhen L. Tang

Keyword(s):

Type I Error ◽

Error Rates ◽

Type I ◽

Type I Error Rates ◽

Mean Vectors ◽

Equality Of Mean Vectors

Download Full-text

Performance of Monte Carlo Permutation and Approximate Tests for Multivariate Means Comparisons With Small Sample Sizes When Parametric Assumptions are Violated

Methodology ◽

10.1027/1614-2241.5.2.60 ◽

2009 ◽

Vol 5 (2) ◽

pp. 60-70 ◽

Cited By ~ 6

Author(s):

W. Holmes Finch ◽

Teresa Davenport

Keyword(s):

Monte Carlo ◽

Type I Error ◽

Permutation Tests ◽

Error Rates ◽

Covariance Matrices ◽

Small Sample ◽

Type I ◽

Permutation Testing ◽

Sample Sizes ◽

Type I Error Rates

Permutation testing has been suggested as an alternative to the standard F approximate tests used in multivariate analysis of variance (MANOVA). These approximate tests, such as Wilks’ Lambda and Pillai’s Trace, have been shown to perform poorly when assumptions of normally distributed dependent variables and homogeneity of group covariance matrices were violated. Because Monte Carlo permutation tests do not rely on distributional assumptions, they may be expected to work better than their approximate cousins when the data do not conform to the assumptions described above. The current simulation study compared the performance of four standard MANOVA test statistics with their Monte Carlo permutation-based counterparts under a variety of conditions with small samples, including conditions when the assumptions were met and when they were not. Results suggest that for sample sizes of 50 subjects, power is very low for all the statistics. In addition, Type I error rates for both the approximate F and Monte Carlo tests were inflated under the condition of nonnormal data and unequal covariance matrices. In general, the performance of the Monte Carlo permutation tests was slightly better in terms of Type I error rates and power when both assumptions of normality and homogeneous covariance matrices were not met. It should be noted that these simulations were based upon the case with three groups only, and as such results presented in this study can only be generalized to similar situations.

Download Full-text

Assessment of Type I Error Rates and Power of Common Analysis Methods in Murine Obesity-Related Study: ‘Plasmode-Based’ Simulation (P13-011-19)

Current Developments in Nutrition ◽

10.1093/cdn/nzz036.p13-011-19 ◽

2019 ◽

Vol 3 (Supplement_1) ◽

Author(s):

Keisuke Ejima ◽

Andrew Brown ◽

Daniel Smith ◽

Ufuk Beyaztas ◽

David Allison

Keyword(s):

Sample Size ◽

Error Rate ◽

Type I Error ◽

Error Rates ◽

T Test ◽

Small Samples ◽

Type I ◽

Type I Error Rates ◽

Type I Error Rate ◽

Weight Distributions

Abstract Objectives Rigor, reproducibility and transparency (RRT) awareness has expanded over the last decade. Although RRT can be improved from various aspects, we focused on type I error rates and power of commonly used statistical analyses testing mean differences of two groups, using small (n ≤ 5) to moderate sample sizes. Methods We compared data from five distinct, homozygous, monogenic, murine models of obesity with non-mutant controls of both sexes. Baseline weight (7–11 weeks old) was the outcome. To examine whether type I error rate could be affected by choice of statistical tests, we adjusted the empirical distributions of weights to ensure the null hypothesis (i.e., no mean difference) in two ways: Case 1) center both weight distributions on the same mean weight; Case 2) combine data from control and mutant groups into one distribution. From these cases, 3 to 20 mice were resampled to create a ‘plasmode’ dataset. We performed five common tests (Student's t-test, Welch's t-test, Wilcoxon test, permutation test and bootstrap test) on the plasmodes and computed type I error rates. Power was assessed using plasmodes, where the distribution of the control group was shifted by adding a constant value as in Case 1, but to realize nominal effect sizes. Results Type I error rates were unreasonably higher than the nominal significance level (type I error rate inflation) for Student's t-test, Welch's t-test and permutation especially when sample size was small for Case 1, whereas inflation was observed only for permutation for Case 2. Deflation was noted for bootstrap with small sample. Increasing sample size mitigated inflation and deflation, except for Wilcoxon in Case 1 because heterogeneity of weight distributions between groups violated assumptions for the purposes of testing mean differences. For power, a departure from the reference value was observed with small samples. Compared with the other tests, bootstrap was underpowered with small samples as a tradeoff for maintaining type I error rates. Conclusions With small samples (n ≤ 5), bootstrap avoided type I error rate inflation, but often at the cost of lower power. To avoid type I error rate inflation for other tests, sample size should be increased. Wilcoxon should be avoided because of heterogeneity of weight distributions between mutant and control mice. Funding Sources This study was supported in part by NIH and Japan Society for Promotion of Science (JSPS) KAKENHI grant.

Download Full-text

The Modification and Evaluation of the Alexander-Govern Test in Terms of Power

Modern Applied Science ◽

10.5539/mas.v9n13p1 ◽

2015 ◽

Vol 9 (13) ◽

pp. 1

Author(s):

Tobi Kingsley Ochuko ◽

Suhaida Abdullah ◽

Zakiyah Binti Zain ◽

Sharipah Soaad Syed Yahaya

Keyword(s):

Sample Size ◽

High Power ◽

Type I Error ◽

Error Rates ◽

Robust Estimator ◽

Type I ◽

Central Tendency ◽

Unequal Variance ◽

Central Tendency Measure ◽

Type I Error Rates

This study centres on the comparison of independent group tests in terms of power, by using parametric method, such as the Alexander-Govern test. The Alexander-Govern (AG) test uses mean as its central tendency measure. It is a better alternative compared to the Welch test, the James test and the ANOVA, because it produces high power and gives good control of Type I error rates for a normal data under variance heterogeneity. But this test is not robust for a non-normal data. When trimmed mean was applied on the test as its central tendency measure under non-normality, the test was only robust for two group condition, but as the number of groups increased more than two groups, the test was no more robust. As a result, a highly robust estimator known as the MOM estimator was applied on the test, as its central tendency measure. This test is not affected by the number of groups, but could not control Type I error rates under skewed heavy tailed distribution. In this study, the Winsorized MOM estimator was applied in the AG test, as its central tendency measure. A simulation of 5,000 data sets were generated and analysed on the test, using the SAS package. The result of the analysis, shows that with the pairing of unbalanced sample size of (15:15:20:30) with equal variance of (1:1:1:1) and the pairing of unbalanced sample size of (15:15:20:30) with unequal variance of (1:1:1:36) with effect size index (f = 0.8), the AGWMOM test only produced a high power value of 0.9562 and 0.8336 compared to the AG test, the AGMOM test and the ANOVA respectively and the test is considered to be sufficient.

Download Full-text

On Sample Size Requirements for Johansen’s Test

Journal of Educational and Behavioral Statistics ◽

10.3102/10769986021002169 ◽

1996 ◽

Vol 21 (2) ◽

pp. 169-178 ◽

Cited By ~ 5

Author(s):

William T. Coombs ◽

James Algina

Keyword(s):

Sample Size ◽

Type I Error ◽

Simulated Data ◽

Error Rates ◽

Type I ◽

Type I Error Rates ◽

Dependent Variables

Type I error rates for the Johansen test were estimated using simulated data for a variety of conditions. The design of the experiment was a 2 × 2× 2× 3× 9× 3 factorial. The factors were (a) type of distribution, (b) number of dependent variables, (c) number of groups, (d) ratio of the smallest sample size to the number of dependent variables, (e) sample size ratios, and (f) degree of heteroscedasticity. The results indicate that Type I error rates for the Johansen test depend heavily on the number of groups and the ratio of the smallest sample size to the number of dependent variables. Type I error rates depend to a lesser extent on the distribution types used in the study. Based on the results, sample size guidelines are presented.

Download Full-text

Systematic Review of the use of “Magnitude-Based Inference” in Sports Science and Medicine

10.31236/osf.io/wugcr ◽

2020 ◽

Cited By ~ 1

Author(s):

Keith Lohse ◽

Kristin Sainani ◽

J. Andrew Taylor ◽

Michael Lloyd Butson ◽

Emma Knight ◽

...

Keyword(s):

Sample Size ◽

Multiple Testing ◽

Type I Error ◽

A Priori ◽

Error Rates ◽

Significance Testing ◽

Type I ◽

P Values ◽

Type I Error Rates ◽

Sports Science

Magnitude-based inference (MBI) is a controversial statistical method that has been used in hundreds of papers in sports science despite criticism from statisticians. To better understand how this method has been applied in practice, we systematically reviewed 232 papers that used MBI. We extracted data on study design, sample size, and choice of MBI settings and parameters. Median sample size was 10 per group (interquartile range, IQR: 8 – 15) for multi-group studies and 14 (IQR: 10 – 24) for single-group studies; few studies reported a priori sample size calculations (15%). Authors predominantly applied MBI’s default settings and chose “mechanistic/non-clinical” rather than “clinical” MBI even when testing clinical interventions (only 14 studies out of 232 used clinical MBI). Using these data, we can estimate the Type I error rates for the typical MBI study. Authors frequently made dichotomous claims about effects based on the MBI criterion of a “likely” effect and sometimes based on the MBI criterion of a “possible” effect. When the sample size is n=8 to 15 per group, these inferences have Type I error rates of 12%-22% and 22%-45%, respectively. High Type I error rates were compounded by multiple testing: Authors reported results from a median of 30 tests related to outcomes; and few studies specified a primary outcome (14%). We conclude that MBI has promoted small studies, promulgated a “black box” approach to statistics, and led to numerous papers where the conclusions are not supported by the data. Amidst debates over the role of p-values and significance testing in science, MBI also provides an important natural experiment: we find no evidence that moving researchers away from p-values or null hypothesis significance testing makes them less prone to dichotomization or over-interpretation of findings.

Download Full-text

TYPE I ERROR RATES FOR TESTING GENETIC DRIFT WITH PHENOTYPIC COVARIANCE MATRICES: A SIMULATION STUDY

Evolution ◽

10.1111/j.1558-5646.2012.01746.x ◽

2012 ◽

Vol 67 (1) ◽

pp. 185-195 ◽

Cited By ~ 15

Author(s):

Miguel Prôa ◽

Paul O'Higgins ◽

Leandro R. Monteiro

Keyword(s):

Genetic Drift ◽

Simulation Study ◽

Type I Error ◽

Error Rates ◽

Covariance Matrices ◽

Type I ◽

Type I Error Rates

Download Full-text

Robustness of asymptotic and bootstrap tests for multivariate homogeneity of covariance matrices

Ciência e Agrotecnologia ◽

10.1590/s1413-70542008000100023 ◽

2008 ◽

Vol 32 (1) ◽

pp. 157-166 ◽

Cited By ~ 2

Author(s):

Roberta Bessa Veloso Silva ◽

Daniel Furtado Ferreira ◽

Denismar Alves Nogueira

Keyword(s):

Type I Error ◽

Confidence Regions ◽

Error Rates ◽

Covariance Matrices ◽

Type I ◽

Bootstrap Test ◽

Type I Error Rates ◽

Normal Populations ◽

Bartlett’S Test ◽

Bootstrap Tests

The present work emphasizes the importance of testing hypothesis on homogeneity of covariance matrices from multivariate k populations. The violation of the assumption of the homogeneity of covariance matrices affects the performance of the tests and the coverage probability of the confidence regions. This work intends to apply two tests of homogeneity of covariance and to evaluate type I error rates and power using Monte Carlo simulation in normal populations and robustness in non normal populations. Multivariate Bartlett's test (MBT) and its bootstrap version (MBTB) were used. Different configurations are tested combining sample sizes, number of variates, correlation and number of populations. Results show that the bootstrap test was considered superior to the asymptotic test and robust, since it controls the type error I rate.

Download Full-text

Type I error rates and power of several versions of scaled chi-square difference tests in investigations of measurement invariance.

Psychological Methods ◽

10.1037/met0000097 ◽

2017 ◽

Vol 22 (3) ◽

pp. 467-485 ◽

Cited By ~ 4

Author(s):

Jordan Campbell Brace ◽

Victoria Savalei

Keyword(s):

Measurement Invariance ◽

Type I Error ◽

Error Rates ◽

Type I ◽

Chi Square ◽

Type I Error Rates

Download Full-text