Statistical tests under Dallal’s model: Asymptotic and exact methods

Zhiming Li; Changxing Ma; Mingyao Ai

doi:10.1371/journal.pone.0242722

Statistical tests under Dallal’s model: Asymptotic and exact methods

PLoS ONE ◽

10.1371/journal.pone.0242722 ◽

2020 ◽

Vol 15 (11) ◽

pp. e0242722

Author(s):

Zhiming Li ◽

Changxing Ma ◽

Mingyao Ai

Keyword(s):

Type I Error ◽

Statistical Tests ◽

Score Test ◽

Error Rates ◽

Small Data ◽

Type I ◽

Test Statistics ◽

Exact Methods ◽

Large Samples ◽

Numerical Studies

This paper proposes asymptotic and exact methods for testing the equality of correlations for multiple bilateral data under Dallal’s model. Three asymptotic test statistics are derived for large samples. Since they are not applicable to small data, several conditional and unconditional exact methods are proposed based on these three statistics. Numerical studies are conducted to compare all these methods with regard to type I error rates (TIEs) and powers. The results show that the asymptotic score test is the most robust, and two exact tests have satisfactory TIEs and powers. Some real examples are provided to illustrate the effectiveness of these tests.

Download Full-text

A fast and accurate algorithm to test for binary phenotypes and its application to PheWAS

10.1101/109876 ◽

2017 ◽

Cited By ~ 1

Author(s):

Rounak Dey ◽

Ellen M. Schmidt ◽

Goncalo R. Abecasis ◽

Seunggeun Lee

Keyword(s):

Type I Error ◽

Score Test ◽

Saddlepoint Approximation ◽

Case Control ◽

Error Rates ◽

Type I ◽

Test Statistic ◽

Association Analyses ◽

Type I Error Rates ◽

And Control

AbstractThe availability of electronic health record (EHR)-based phenotypes allows for genome-wide association analyses in thousands of traits, and has great potential to identify novel genetic variants associated with clinical phenotypes. We can interpret the phenome-wide association study (PheWAS) result for a single genetic variant by observing its association across a landscape of phenotypes. Since PheWAS can test 1000s of binary phenotypes, and most of them have unbalanced (case:control = 1:10) or often extremely unbalanced (case:control = 1:600) case-control ratios, existing methods cannot provide an accurate and scalable way to test for associations. Here we propose a computationally fast score test-based method that estimates the distribution of the test statistic using the saddlepoint approximation. Our method is much faster than the state of the art Firth’s test (∼ 100 times). It can also adjust for covariates and control type I error rates even when the case-control ratio is extremely unbalanced. Through application to PheWAS data from the Michigan Genomics Initiative, we show that the proposed method can control type I error rates while replicating previously known association signals even for traits with a very small number of cases and a large number of controls.

Download Full-text

Comparison of Test Statistics of Nonnormal and Unbalanced Samples for Multivariate Analysis of Variance in terms of Type-I Error Rates

Computational and Mathematical Methods in Medicine ◽

10.1155/2019/2173638 ◽

2019 ◽

Vol 2019 ◽

pp. 1-8 ◽

Cited By ~ 8

Author(s):

Can Ateş ◽

Özlem Kaymaz ◽

H. Emre Kale ◽

Mustafa Agah Tekindal

Keyword(s):

Normal Distribution ◽

Type I Error ◽

Error Rates ◽

Type I ◽

Test Statistics ◽

Test Statistic ◽

Heterogeneous Variance ◽

Heterogeneous Variances ◽

Trace Test ◽

Wilks Lambda

In this study, we investigate how Wilks’ lambda, Pillai’s trace, Hotelling’s trace, and Roy’s largest root test statistics can be affected when the normal and homogeneous variance assumptions of the MANOVA method are violated. In other words, in these cases, the robustness of the tests is examined. For this purpose, a simulation study is conducted in different scenarios. In different variable numbers and different sample sizes, considering the group variances are homogeneous σ12=σ22=⋯=σg2 and heterogeneous (increasing) σ12<σ22<⋯<σg2, random numbers are generated from Gamma(4-4-4; 0.5), Gamma(4-9-36; 0.5), Student’s t(2), and Normal(0; 1) distributions. Furthermore, the number of observations in the groups being balanced and unbalanced is also taken into account. After 10000 repetitions, type-I error values are calculated for each test for α = 0.05. In the Gamma distribution, Pillai’s trace test statistic gives more robust results in the case of homogeneous and heterogeneous variances for 2 variables, and in the case of 3 variables, Roy’s largest root test statistic gives more robust results in balanced samples and Pillai’s trace test statistic in unbalanced samples. In Student’s t distribution, Pillai’s trace test statistic gives more robust results in the case of homogeneous variance and Wilks’ lambda test statistic in the case of heterogeneous variance. In the normal distribution, in the case of homogeneous variance for 2 variables, Roy’s largest root test statistic gives relatively more robust results and Wilks’ lambda test statistic for 3 variables. Also in the case of heterogeneous variance for 2 and 3 variables, Roy’s largest root test statistic gives robust results in the normal distribution. The test statistics used with MANOVA are affected by the violation of homogeneity of covariance matrices and normality assumptions particularly from unbalanced number of observations.

Download Full-text

Multiple Testing. Part I. Single-Step Procedures for Control of General Type I Error Rates

Statistical Applications in Genetics and Molecular Biology ◽

10.2202/1544-6115.1040 ◽

2004 ◽

Vol 3 (1) ◽

pp. 1-69 ◽

Cited By ~ 53

Author(s):

Sandrine Dudoit ◽

Mark J. van der Laan ◽

Katherine S. Pollard

Keyword(s):

Error Rate ◽

Multiple Testing ◽

Type I Error ◽

Null Distribution ◽

Error Rates ◽

Single Step ◽

Type I ◽

Test Statistics ◽

Testing Procedures ◽

Multiple Testing Procedures

The present article proposes general single-step multiple testing procedures for controlling Type I error rates defined as arbitrary parameters of the distribution of the number of Type I errors, such as the generalized family-wise error rate. A key feature of our approach is the test statistics null distribution (rather than data generating null distribution) used to derive cut-offs (i.e., rejection regions) for these test statistics and the resulting adjusted p-values. For general null hypotheses, corresponding to submodels for the data generating distribution, we identify an asymptotic domination condition for a null distribution under which single-step common-quantile and common-cut-off procedures asymptotically control the Type I error rate, for arbitrary data generating distributions, without the need for conditions such as subset pivotality. Inspired by this general characterization of a null distribution, we then propose as an explicit null distribution the asymptotic distribution of the vector of null value shifted and scaled test statistics. In the special case of family-wise error rate (FWER) control, our method yields the single-step minP and maxT procedures, based on minima of unadjusted p-values and maxima of test statistics, respectively, with the important distinction in the choice of null distribution. Single-step procedures based on consistent estimators of the null distribution are shown to also provide asymptotic control of the Type I error rate. A general bootstrap algorithm is supplied to conveniently obtain consistent estimators of the null distribution. The special cases of t- and F-statistics are discussed in detail. The companion articles focus on step-down multiple testing procedures for control of the FWER (van der Laan et al., 2004b) and on augmentations of FWER-controlling methods to control error rates such as tail probabilities for the number of false positives and for the proportion of false positives among the rejected hypotheses (van der Laan et al., 2004a). The proposed bootstrap multiple testing procedures are evaluated by a simulation study and applied to genomic data in the fourth article of the series (Pollard et al., 2004).

Download Full-text

Detection of Item Preknowledge Using Likelihood Ratio Test and Score Test

Journal of Educational and Behavioral Statistics ◽

10.3102/1076998616673872 ◽

2016 ◽

Vol 42 (1) ◽

pp. 46-68 ◽

Cited By ~ 19

Author(s):

Sandip Sinharay

Keyword(s):

Type I Error ◽

Score Test ◽

Error Rates ◽

Ratio Test ◽

Type I ◽

Polytomous Items ◽

Detailed Simulation ◽

Standard Normal ◽

New Statistics ◽

Item Preknowledge

An increasing concern of producers of educational assessments is fraudulent behavior during the assessment (van der Linden, 2009). Benefiting from item preknowledge (e.g., Eckerly, 2017; McLeod, Lewis, & Thissen, 2003) is one type of fraudulent behavior. This article suggests two new test statistics for detecting individuals who may have benefited from item preknowledge; the statistics can be used for both nonadaptive and adaptive assessments that may include either or both of dichotomous and polytomous items. Each new statistic has an asymptotic standard normal n distribution. It is demonstrated in detailed simulation studies that the Type I error rates of the new statistics are close to the nominal level and the values of power of the new statistics are larger than those of an existing statistic for addressing the same problem.

Download Full-text

Adding new experimental arms to randomised clinical trials: Impact on error rates

Clinical Trials ◽

10.1177/1740774520904346 ◽

2020 ◽

Vol 17 (3) ◽

pp. 273-284 ◽

Cited By ~ 1

Author(s):

Babak Choodari-Oskooei ◽

Daniel J Bratton ◽

Melissa R Gannon ◽

Angela M Meade ◽

Matthew R Sydes ◽

...

Keyword(s):

Error Rate ◽

Type I Error ◽

Controlled Trial ◽

Late Phase ◽

Pairwise Comparison ◽

Error Rates ◽

Pairwise Comparisons ◽

Type I ◽

Test Statistics ◽

Type I Error Rate

Background: Experimental treatments pass through various stages of development. If a treatment passes through early-phase experiments, the investigators may want to assess it in a late-phase randomised controlled trial. An efficient way to do this is adding it as a new research arm to an ongoing trial while the existing research arms continue, a so-called multi-arm platform trial. The familywise type I error rate is often a key quantity of interest in any multi-arm platform trial. We set out to clarify how it should be calculated when new arms are added to a trial some time after it has started. Methods: We show how the familywise type I error rate, any-pair and all-pairs powers can be calculated when a new arm is added to a platform trial. We extend the Dunnett probability and derive analytical formulae for the correlation between the test statistics of the existing pairwise comparison and that of the newly added arm. We also verify our analytical derivation via simulations. Results: Our results indicate that the familywise type I error rate depends on the shared control arm information (i.e. individuals in continuous and binary outcomes and primary outcome events in time-to-event outcomes) from the common control arm patients and the allocation ratio. The familywise type I error rate is driven more by the number of pairwise comparisons and the corresponding (pairwise) type I error rates than by the timing of the addition of the new arms. The familywise type I error rate can be estimated using Šidák’s correction if the correlation between the test statistics of pairwise comparisons is less than 0.30. Conclusions: The findings we present in this article can be used to design trials with pre-planned deferred arms or to add new pairwise comparisons within an ongoing platform trial where control of the pairwise error rate or familywise type I error rate (for a subset of pairwise comparisons) is required.

Download Full-text

Asymptotic versus exact methods in the analysis of contingency tables: Evidence-based practical recommendations

Statistical Methods in Medical Research ◽

10.1177/0962280220902480 ◽

2020 ◽

Vol 29 (9) ◽

pp. 2569-2582

Author(s):

Miguel A García-Pérez ◽

Vicente Núñez-Antón

Keyword(s):

Type I Error ◽

Contingency Tables ◽

Discrete Distribution ◽

Error Rates ◽

Asymptotic Distributions ◽

Type I ◽

Exact Tests ◽

Exact Methods ◽

P Values ◽

Multinomial Models

Controversy over the validity of significance tests in the analysis of contingency tables is motivated by the disagreement between asymptotic and exact p values and its dependence on the magnitude of expected frequencies. Variants of Pearson’s X2 statistic and their asymptotic distributions were proposed to overcome the difficulties, but several approaches also exist to conduct exact tests. This paper shows that discrepant asymptotic and exact results may or may not occur whether expected frequencies are large or small: Eventual inaccuracy of asymptotic p values is instead caused by idiosyncrasies of the discrete distribution of X2. More importantly, discrepancies are also artificially created by the hypergeometric sampling model used to perform exact tests. Exact computations under the alternative full-multinomial or product-multinomial models require eliminating nuisance parameters and we propose a novel method that integrates them out. The resultant exact distributions are very accurately approximated by the asymptotic distribution, which eliminates concerns about the accuracy of the latter. We also discuss that the two-stage approach that tests for significance of residuals conditional on a significant X2 test is inadvisable and that an alternative single-stage test preserves Type-I error rates and further eliminates concerns about asymptotic accuracy.

Download Full-text

Combined 5 × 2 cv F Test for Comparing Supervised Classification Learning Algorithms

Neural Computation ◽

10.1162/089976699300016007 ◽

1999 ◽

Vol 11 (8) ◽

pp. 1885-1892 ◽

Cited By ~ 225

Author(s):

Ethem Alpaydm

Keyword(s):

Type I Error ◽

Statistical Tests ◽

Error Rates ◽

Type I ◽

Classification Learning ◽

F Test ◽

Robust Test ◽

Significant Difference ◽

Higher Power ◽

Test Result

Dietterich (1998) reviews five statistical tests and proposes the 5 × 2 cvt test for determining whether there is a significant difference between the error rates of two classifiers. In our experiments, we noticed that the 5 × 2 cvt test result may vary depending on factors that should not affect the test, and we propose a variant, the combined 5 × 2 cv F test, that combines multiple statistics to get a more robust test. Simulation results show that this combined version of the test has lower type I error and higher power than 5 × 2 cv proper.

Download Full-text

Comparison of Some Robust Wilks’ Statistics for the One-Way Multivariate Analysis of Variance (MANOVA )

Journal of Al-Qadisiyah for Computer Science and Mathematics ◽

10.29304/jqcm.2019.11.2.556 ◽

2019 ◽

Vol 11 (2) ◽

pp. 42-58

Author(s):

Abdullah A. Ameen ◽

Osama H. Abbas

Keyword(s):

Multivariate Analysis ◽

Analysis Of Variance ◽

Type I Error ◽

Error Rates ◽

Type I ◽

Test Statistics ◽

Multivariate Analysis Of Variance ◽

Power Of The Test ◽

Monte Carlo Studies ◽

The One

The classicalWilks' statistic is mostly used to test hypothesesin the one-way multivariate analysis of variance (MANOVA), which is highly sensitive to the effects of outliers. The non-robustness of the test statistics based on normal theory has led many authors to examine various options.In this paper, we presented a robust version of the Wilks' statistic and constructed its approximate distribution.A comparison was made between the proposed statistics and some Wilks' statistics. The Monte Carlo studies are used to obtain performance assessment of test statistics in different data sets.Moreover, the results of the type I error rate and the power of test were considered as statistical tools to compare test statistics.The study reveals that, under normally distributed, the type I error rates for the classical and the proposedWilks' statistics are close to the true significance levels, and the power of the test statistics are so close. In addition, in the case of contaminated distribution, the proposed statistic is the best.

Download Full-text

Variance Heterogeneity in Published Psychological Research

Methodology ◽

10.1027/1614-2241/a000034 ◽

2012 ◽

Vol 8 (1) ◽

pp. 1-11 ◽

Cited By ~ 21

Author(s):

John Ruscio ◽

Brendan Roche

Keyword(s):

Empirical Data ◽

Type I Error ◽

Sampling Error ◽

Statistical Tests ◽

Total Sample ◽

Psychological Research ◽

Error Rates ◽

Type I ◽

Variance Heterogeneity ◽

Sample Variances

Parametric assumptions for statistical tests include normality and equal variances. Micceri (1989) found that data frequently violate the normality assumption; variances have received less attention. We recorded within-group variances of dependent variables for 455 studies published in leading psychology journals. Sample variances differed, often substantially, suggesting frequent violation of the assumption of equal population variances. Parallel analyses of equal-variance artificial data otherwise matched to the characteristics of the empirical data show that unequal sample variances in the empirical data exceed expectations from normal sampling error and can adversely affect Type I error rates of parametric statistical tests. Variance heterogeneity was unrelated to relative group sizes or total sample size and observed across subdisciplines of psychology in experimental and correlational research. These results underscore the value of examining variances and, when appropriate, using data-analytic methods robust to unequal variances. We provide a standardized index for examining and reporting variance heterogeneity.

Download Full-text

Két pszichológiai populáció sztochasztikus egyenlőségének ellenőrzésére alkalmas statisztikai próbák összehasonlító vizsgálata

Magyar Pszichológiai Szemle ◽

10.1556/mpszle.55.2000.2-3.5 ◽

2000 ◽

Vol 55 (2-3) ◽

pp. 253-281 ◽

Cited By ~ 1

Author(s):

András Vargha

Keyword(s):

Type I Error ◽

Statistical Tests ◽

Power Level ◽

Error Rates ◽

T Test ◽

Type I ◽

Welch Test ◽

Wide Range ◽

Two Populations ◽

Stochastic Equality

A jelen tanulmányban a sztochasztikus egyenlőség ellenőrzésére alkalmas hat statisztikai próbát hasonlítottunk össze számítógépes szimulációval az érvényesség és a hatékonyság kritériuma szempontjából. Két populációt akkor mondunk sztochasztikusan egyenlőnek valamely X változó tekintetében, ha véletlenszerűen kiválasztva egy-egy X-értéket a két populációból, az elsőből kiválasztott érték ugyanakkora eséllyel lesz nagyobb a második kiválasztottnál, mint kisebb.A szimulációban széles tartományban variáltuk az eloszlások ferdeségét és csúcsosságát, valamint a szórásheterogenitás mértékét. Vizsgáltunk kicsi és közepes nagyságú, illetve egyenlő és különböző elemszámú mintákat. A szimulációba a korábban már mások által is javasolt próbák (rang t, rang Welch, Fligner-Policello, Cliff) mellett elméleti megfontolások alapján két új próbát (FPW és FPCW) is bevontunk.A szimulációs vizsgálatok arra az érdekes eredményre vezettek, hogy az újonnan javasolt két próba, FPW és FPCW az érvényesség tekintetében sokkal megbízhatóbb eljárásnak bizonyult, mint a többiek, miközben az erő tekintetében nem tapasztaltunk számottevő különbséget közöttük. Különösen FPW jeleskedett azzal, hogy I. fajta hibája sosem tért el számottevően a névleges szinttől. Közepes nagyságú minták esetén FPCW FPW-vel egyenértékű eljárás benyomását keltette.In the current paper six statistical tests of stochastic equality are to be compared by a Monte Carlo simulation with respect to Type I error and power. Two populations are said to be stochastically equal with respect to a variable X, if for any two independently and randomly drawn observations X1 and X2 from the two populations P(X1 ≯ X2) = P(X1 < X2).In the simulation the skewness and kurtosis levels as well as the extent of variance heterogeneity of the two parent distributions were varied across a wide range. The sample sizes applied were either small or moderate, and equal or unequal. The involved tests of stochastic equality were as follows: rank t test, rank Welch test, Fligner-Policello test, Cliff's modified Fligner-Policello test as well as two modifications of the last two tests, denoted FPW and FPCW, that utilized adjusted degrees of freedom.An interesting result obtained is that the two newly introduced test variants, FPW and FPCW, proved to be substantially more accurate with regard to their Type I error rates than the others, whereas they kept a similar power level. Specifically, the estimated Type I error of FPW at .05 nominal level always fell in the range of .043-.063 even if the variance ratio of the two distributions was as large as 1:16. The same ranges were .049-.068 for FPCW, but .029-.160 for the rank t test, .049-.096 for the rank Welch test, .035-.075 for the Fligner-Policello test, and .040-.078 for Cliff's test.

Download Full-text