Statistical model specification and power: recommendations on the use of test-qualified pooling in analysis of experimental data

Nick Colegrave; Graeme D. Ruxton

doi:10.1098/rspb.2016.1850

Statistical model specification and power: recommendations on the use of test-qualified pooling in analysis of experimental data

Proceedings of The Royal Society B Biological Sciences ◽

10.1098/rspb.2016.1850 ◽

2017 ◽

Vol 284 (1851) ◽

pp. 20161850 ◽

Cited By ~ 7

Author(s):

Nick Colegrave ◽

Graeme D. Ruxton

Keyword(s):

Experimental Data ◽

Statistical Model ◽

Statistical Power ◽

Error Term ◽

Degrees Of Freedom ◽

Type I Error ◽

Error Rates ◽

Statistical Testing ◽

Model Specification ◽

Type I

A common approach to the analysis of experimental data across much of the biological sciences is test-qualified pooling. Here non-significant terms are dropped from a statistical model, effectively pooling the variation associated with each removed term with the error term used to test hypotheses (or estimate effect sizes). This pooling is only carried out if statistical testing on the basis of applying that data to a previous more complicated model provides motivation for this model simplification; hence the pooling is test-qualified. In pooling, the researcher increases the degrees of freedom of the error term with the aim of increasing statistical power to test their hypotheses of interest. Despite this approach being widely adopted and explicitly recommended by some of the most widely cited statistical textbooks aimed at biologists, here we argue that (except in highly specialized circumstances that we can identify) the hoped-for improvement in statistical power will be small or non-existent, and there is likely to be much reduced reliability of the statistical procedures through deviation of type I error rates from nominal levels. We thus call for greatly reduced use of test-qualified pooling across experimental biology, more careful justification of any use that continues, and a different philosophy for initial selection of statistical models in the light of this change in procedure.

Download Full-text

Optimal selection of genetic variants for adjustment of population stratification in European association studies

Briefings in Bioinformatics ◽

10.1093/bib/bbz023 ◽

2019 ◽

Vol 21 (3) ◽

pp. 753-761 ◽

Cited By ~ 2

Author(s):

Regina Brinster ◽

Dominique Scherer ◽

Justo Lorenzo Bermejo

Keyword(s):

Genetic Variants ◽

Population Stratification ◽

Statistical Power ◽

Type I Error ◽

Association Studies ◽

Reference Sample ◽

Error Rates ◽

The Cancer Genome Atlas ◽

Type I ◽

Genotype Data

Abstract Population stratification is usually corrected relying on principal component analysis (PCA) of genome-wide genotype data, even in populations considered genetically homogeneous, such as Europeans. The need to genotype only a small number of genetic variants that show large differences in allele frequency among subpopulations—so-called ancestry-informative markers (AIMs)—instead of the whole genome for stratification adjustment could represent an advantage for replication studies and candidate gene/pathway studies. Here we compare the correction performance of classical and robust principal components (PCs) with the use of AIMs selected according to four different methods: the informativeness for assignment measure ($IN$-AIMs), the combination of PCA and F-statistics, PCA-correlated measurement and the PCA weighted loadings for each genetic variant. We used real genotype data from the Population Reference Sample and The Cancer Genome Atlas to simulate European genetic association studies and to quantify type I error rate and statistical power in different case–control settings. In studies with the same numbers of cases and controls per country and control-to-case ratios reflecting actual rates of disease prevalence, no adjustment for population stratification was required. The unnecessary inclusion of the country of origin, PCs or AIMs as covariates in the regression models translated into increasing type I error rates. In studies with cases and controls from separate countries, no investigated method was able to adequately correct for population stratification. The first classical and the first two robust PCs achieved the lowest (although inflated) type I error, followed at some distance by the first eight $IN$-AIMs.

Download Full-text

A comparative analysis of cell-type adjustment methods for epigenome-wide association studies based on simulated and real data sets

Briefings in Bioinformatics ◽

10.1093/bib/bby068 ◽

2018 ◽

Vol 20 (6) ◽

pp. 2055-2065 ◽

Cited By ~ 1

Author(s):

Johannes Brägelmann ◽

Justo Lorenzo Bermejo

Keyword(s):

Statistical Power ◽

Type I Error ◽

Association Studies ◽

Real Data ◽

Error Rates ◽

Data Sets ◽

Type I ◽

Cell Type ◽

Type I Error Rates

Abstract Technological advances and reduced costs of high-density methylation arrays have led to an increasing number of association studies on the possible relationship between human disease and epigenetic variability. DNA samples from peripheral blood or other tissue types are analyzed in epigenome-wide association studies (EWAS) to detect methylation differences related to a particular phenotype. Since information on the cell-type composition of the sample is generally not available and methylation profiles are cell-type specific, statistical methods have been developed for adjustment of cell-type heterogeneity in EWAS. In this study we systematically compared five popular adjustment methods: the factored spectrally transformed linear mixed model (FaST-LMM-EWASher), the sparse principal component analysis algorithm ReFACTor, surrogate variable analysis (SVA), independent SVA (ISVA) and an optimized version of SVA (SmartSVA). We used real data and applied a multilayered simulation framework to assess the type I error rate, the statistical power and the quality of estimated methylation differences according to major study characteristics. While all five adjustment methods improved false-positive rates compared with unadjusted analyses, FaST-LMM-EWASher resulted in the lowest type I error rate at the expense of low statistical power. SVA efficiently corrected for cell-type heterogeneity in EWAS up to 200 cases and 200 controls, but did not control type I error rates in larger studies. Results based on real data sets confirmed simulation findings with the strongest control of type I error rates by FaST-LMM-EWASher and SmartSVA. Overall, ReFACTor, ISVA and SmartSVA showed the best comparable statistical power, quality of estimated methylation differences and runtime.

Download Full-text

Type I Error Rates for Welch’s Test and James’s Second-Order Test Under Nonnormality and Inequality of Variance When There Are Two Groups

Journal of Educational Statistics ◽

10.3102/10769986019003275 ◽

1994 ◽

Vol 19 (3) ◽

pp. 275-291 ◽

Cited By ~ 28

Author(s):

James Algina ◽

T. C. Oshima ◽

Wen-Ying Lin

Keyword(s):

Degrees Of Freedom ◽

Type I Error ◽

Total Sample ◽

Error Rates ◽

Second Order ◽

T Test ◽

Type I ◽

Sample Sizes ◽

Unequal Variances ◽

Type I Error Rates

Type I error rates were estimated for three tests that compare means by using data from two independent samples: the independent samples t test, Welch’s approximate degrees of freedom test, and James’s second-order test. Type I error rates were estimated for skewed distributions, equal and unequal variances, equal and unequal sample sizes, and a range of total sample sizes. Welch’s test and James’s test have very similar Type I error rates and tend to control the Type I error rate as well or better than the independent samples t test does. The results provide guidance about the total sample sizes required for controlling Type I error rates.

Download Full-text

A Monte Carlo Simulation Study for Kolmogorov-Smirnov Two-Sample Test Under the Precondition of Heterogeneity: Upon the Changes on the Probabilities of Statistical Power and Type I Error Rates with Respect to Skewness Measure

SSRN Electronic Journal ◽

10.2139/ssrn.2497601 ◽

2013 ◽

Author(s):

ttken Senger ◽

Ali Kemal elik

Keyword(s):

Monte Carlo Simulation ◽

Monte Carlo ◽

Statistical Power ◽

Type I Error ◽

Error Rates ◽

Type I ◽

Monte Carlo Simulation Study ◽

Type I Error Rates ◽

Sample Test ◽

Kolmogorov Smirnov

Download Full-text

Analyses of Unbalanced Groups-Versus-Individual Research Designs Using Three Alternative Approximate Degrees of Freedom Tests: Test Development and Type I Error Rates

Journal of Modern Applied Statistical Methods ◽

10.22237/jmasm/1177992360 ◽

2007 ◽

Vol 6 (1) ◽

pp. 53-65

Author(s):

Stephanie Wehry ◽

James Algina

Keyword(s):

Degrees Of Freedom ◽

Type I Error ◽

Test Development ◽

Error Rates ◽

Type I ◽

Type I Error Rates ◽

Research Designs

Download Full-text

The effect of number of clusters and cluster size on statistical power and Type I error rates when testing random effects variance components in multilevel linear and logistic regression models

Journal of Statistical Computation and Simulation ◽

10.1080/00949655.2018.1504945 ◽

2018 ◽

Vol 88 (16) ◽

pp. 3151-3163 ◽

Cited By ~ 8

Author(s):

Peter C. Austin ◽

George Leckie

Keyword(s):

Variance Components ◽

Cluster Size ◽

Regression Models ◽

Statistical Power ◽

Type I Error ◽

Error Rates ◽

Type I ◽

Number Of Clusters ◽

Logistic Regression Models ◽

Type I Error Rates

Download Full-text

Bias, Type I Error Rates, and Statistical Power of a Latent Mediation Model in the Presence of Violations of Invariance

Educational and Psychological Measurement ◽

10.1177/0013164416684169 ◽

2017 ◽

Vol 78 (3) ◽

pp. 460-481 ◽

Cited By ~ 3

Author(s):

Margarita Olivera-Aguilar ◽

Samuel H. Rikoon ◽

Oscar Gonzalez ◽

Yasemin Kisbu-Sakarya ◽

David P. MacKinnon

Keyword(s):

Measurement Invariance ◽

Statistical Power ◽

Type I Error ◽

Error Rates ◽

Parameter Estimates ◽

Type I ◽

Mediation Model ◽

Type I Error Rates ◽

Mediated Effects ◽

The Impact

When testing a statistical mediation model, it is assumed that factorial measurement invariance holds for the mediating construct across levels of the independent variable X. The consequences of failing to address the violations of measurement invariance in mediation models are largely unknown. The purpose of the present study was to systematically examine the impact of mediator noninvariance on the Type I error rates, statistical power, and relative bias in parameter estimates of the mediated effect in the single mediator model. The results of a large simulation study indicated that, in general, the mediated effect was robust to violations of invariance in loadings. In contrast, most conditions with violations of intercept invariance exhibited severely positively biased mediated effects, Type I error rates above acceptable levels, and statistical power larger than in the invariant conditions. The implications of these results are discussed and recommendations are offered.

Download Full-text

Type I Error Rates and Statistical Power for the James Second-Order Test and the UnivariateFTest in Two-Way Fixed-Effects ANOVA Models Under Heteroscedasticity and/or Nonnormality

The Journal of Experimental Education ◽

10.1080/00220973.1996.9943463 ◽

1996 ◽

Vol 65 (1) ◽

pp. 57-71 ◽

Cited By ~ 10

Author(s):

Tung-Hsing Hsiung ◽

Stephen Olejnik

Keyword(s):

Fixed Effects ◽

Statistical Power ◽

Type I Error ◽

Error Rates ◽

Second Order ◽

Type I ◽

Type I Error Rates

Download Full-text

The Significance of Power and the Power of Significance: Recommendations for Occupational Therapy Research

The Occupational Therapy Journal of Research ◽

10.1177/153944928400400103 ◽

1984 ◽

Vol 4 (1) ◽

pp. 37-50 ◽

Cited By ~ 2

Author(s):

Kenneth Ottenbacher

Keyword(s):

Occupational Therapy ◽

Statistical Power ◽

Type I Error ◽

Clinical Investigation ◽

Error Rates ◽

High Rate ◽

Type I ◽

Theory Approach ◽

Significance Level ◽

Applied Fields

Research in the behavioral and social sciences including occupational therapy has been shown to be associated with low statistical power and a high rate of Type II experimental errors. Three methods of increasing power that are frequently suggested are increasing sample size, increasing effect size, and increasing the significance level The first two alternatives are often not possible in applied fields such as occupational therapy, and the third is generally not considered desirable since it leads to increased Type I error rates. A fourth alternative is proposed, which involves the partitioning of the decision region into three sections. This procedure is based on the Neyman and Pearson (1933) decision-theory approach to significance testing and is particularly applicable to areas of applied and clinical investigation such as occupational therapy. A sample power table is presented along with formulas to compute table values. The argument is made that using the procedures described will provide a method of unambiguously interpreting nonsignificant results and increase the power and sensitivity of occupational therapy research.

Download Full-text

A more efficient three-arm non-inferiority test based on pooled estimators of the homogeneous variance

Statistical Methods in Medical Research ◽

10.1177/0962280216681036 ◽

2016 ◽

Vol 27 (8) ◽

pp. 2437-2446 ◽

Cited By ~ 1

Author(s):

Hezhi Lu ◽

Hua Jin ◽

Weixiong Zeng

Keyword(s):

Sample Size ◽

Error Rate ◽

Statistical Power ◽

Type I Error ◽

Statistical Testing ◽

New Method ◽

Type I ◽

Simulation Studies ◽

Testing Framework ◽

Better Than

Hida and Tango established a statistical testing framework for the three-arm non-inferiority trial including a placebo with a pre-specified non-inferiority margin to overcome the shortcomings of traditional two-arm non-inferiority trials (such as having to choose the non-inferiority margin). In this paper, we propose a new method that improves their approach with respect to two aspects. We construct our testing statistics based on the best unbiased pooled estimators of the homogeneous variance; and we use the principle of intersection-union tests to determine the rejection rule. We theoretically prove that our test is better than that of Hida and Tango for large sample sizes. Furthermore, when that sample size was small or moderate, our simulation studies showed that our approach performed better than Hida and Tango’s. Although both controlled the type I error rate, their test was more conservative and the statistical power of our test was higher.

Download Full-text