Breaking the circularity in circular analyses: Simulations and formal treatment of the flattened average approach

Howard Bowman; Joseph L. Brooks; Omid Hajilou; Alexia Zoumpoulaki; Vladimir Litvak

doi:10.1371/journal.pcbi.1008286

Breaking the circularity in circular analyses: Simulations and formal treatment of the flattened average approach

PLoS Computational Biology ◽

10.1371/journal.pcbi.1008286 ◽

2020 ◽

Vol 16 (11) ◽

pp. e1008286

Author(s):

Howard Bowman ◽

Joseph L. Brooks ◽

Omid Hajilou ◽

Alexia Zoumpoulaki ◽

Vladimir Litvak

Keyword(s):

Statistical Power ◽

Type I Error ◽

False Positive Rate ◽

Error Rates ◽

Type I ◽

Mathematical Proofs ◽

Positive Rate ◽

Replication Crisis ◽

Post Hoc ◽

Type Ii Errors

There has been considerable debate and concern as to whether there is a replication crisis in the scientific literature. A likely cause of poor replication is the multiple comparisons problem. An important way in which this problem can manifest in the M/EEG context is through post hoc tailoring of analysis windows (a.k.a. regions-of-interest, ROIs) to landmarks in the collected data. Post hoc tailoring of ROIs is used because it allows researchers to adapt to inter-experiment variability and discover novel differences that fall outside of windows defined by prior precedent, thereby reducing Type II errors. However, this approach can dramatically inflate Type I error rates. One way to avoid this problem is to tailor windows according to a contrast that is orthogonal (strictly parametrically orthogonal) to the contrast being tested. A key approach of this kind is to identify windows on a fully flattened average. On the basis of simulations, this approach has been argued to be safe for post hoc tailoring of analysis windows under many conditions. Here, we present further simulations and mathematical proofs to show exactly why the Fully Flattened Average approach is unbiased, providing a formal grounding to the approach, clarifying the limits of its applicability and resolving published misconceptions about the method. We also provide a statistical power analysis, which shows that, in specific contexts, the fully flattened average approach provides higher statistical power than Fieldtrip cluster inference. This suggests that the Fully Flattened Average approach will enable researchers to identify more effects from their data without incurring an inflation of the false positive rate.

Download Full-text

Application of the Hierarchical Bootstrap to Multi-Level Data in Neuroscience

10.1101/819334 ◽

2019 ◽

Cited By ~ 14

Author(s):

Varun Saravanan ◽

Gordon J. Berman ◽

Samuel J. Sober

Keyword(s):

Error Rate ◽

Statistical Power ◽

Type I Error ◽

Statistical Tests ◽

False Positive Rate ◽

Type I ◽

Type I Error Rate ◽

The Hierarchical Structure ◽

Positive Rate ◽

Student’S T

AbstractA common feature in many neuroscience datasets is the presence of hierarchical data structures, most commonly recording the activity of multiple neurons in multiple animals across multiple trials. Accordingly, the measurements constituting the dataset are not independent, even though the traditional statistical analyses often applied in such cases (e.g. Student’s t-test) treat them as such. The hierarchical bootstrap has been shown to be an effective tool to accurately analyze such data and while it has been used extensively in the statistical literature, its use is not widespread in neuroscience - despite the ubiquity of hierarchical datasets. In this paper, we illustrate the intuitiveness and utility of this approach to analyze hierarchically nested datasets. We use simulated neural data to show that traditional statistical tests can result in a false positive rate of over 45%, even if the Type-I error rate is set at 5%. While summarizing data across non-independent points (or lower levels) can potentially fix this problem, this approach greatly reduces the statistical power of the analysis. The hierarchical bootstrap, when applied sequentially over the levels of the hierarchical structure, keeps the Type-I error rate within the intended bound and retains more statistical power than summarizing methods. We conclude by demonstrating the effectiveness of the method in two real-world examples, first analyzing singing data in male Bengalese finches (Lonchura striata var. domestica) and second quantifying changes in behavior under optogenetic control in flies (Drosophila melanogaster).

Download Full-text

Optimal selection of genetic variants for adjustment of population stratification in European association studies

Briefings in Bioinformatics ◽

10.1093/bib/bbz023 ◽

2019 ◽

Vol 21 (3) ◽

pp. 753-761 ◽

Cited By ~ 2

Author(s):

Regina Brinster ◽

Dominique Scherer ◽

Justo Lorenzo Bermejo

Keyword(s):

Genetic Variants ◽

Population Stratification ◽

Statistical Power ◽

Type I Error ◽

Association Studies ◽

Reference Sample ◽

Error Rates ◽

The Cancer Genome Atlas ◽

Type I ◽

Genotype Data

Abstract Population stratification is usually corrected relying on principal component analysis (PCA) of genome-wide genotype data, even in populations considered genetically homogeneous, such as Europeans. The need to genotype only a small number of genetic variants that show large differences in allele frequency among subpopulations—so-called ancestry-informative markers (AIMs)—instead of the whole genome for stratification adjustment could represent an advantage for replication studies and candidate gene/pathway studies. Here we compare the correction performance of classical and robust principal components (PCs) with the use of AIMs selected according to four different methods: the informativeness for assignment measure ($IN$-AIMs), the combination of PCA and F-statistics, PCA-correlated measurement and the PCA weighted loadings for each genetic variant. We used real genotype data from the Population Reference Sample and The Cancer Genome Atlas to simulate European genetic association studies and to quantify type I error rate and statistical power in different case–control settings. In studies with the same numbers of cases and controls per country and control-to-case ratios reflecting actual rates of disease prevalence, no adjustment for population stratification was required. The unnecessary inclusion of the country of origin, PCs or AIMs as covariates in the regression models translated into increasing type I error rates. In studies with cases and controls from separate countries, no investigated method was able to adequately correct for population stratification. The first classical and the first two robust PCs achieved the lowest (although inflated) type I error, followed at some distance by the first eight $IN$-AIMs.

Download Full-text

A comparative analysis of cell-type adjustment methods for epigenome-wide association studies based on simulated and real data sets

Briefings in Bioinformatics ◽

10.1093/bib/bby068 ◽

2018 ◽

Vol 20 (6) ◽

pp. 2055-2065 ◽

Cited By ~ 1

Author(s):

Johannes Brägelmann ◽

Justo Lorenzo Bermejo

Keyword(s):

Statistical Power ◽

Type I Error ◽

Association Studies ◽

Real Data ◽

Error Rates ◽

Data Sets ◽

Type I ◽

Cell Type ◽

Type I Error Rates

Abstract Technological advances and reduced costs of high-density methylation arrays have led to an increasing number of association studies on the possible relationship between human disease and epigenetic variability. DNA samples from peripheral blood or other tissue types are analyzed in epigenome-wide association studies (EWAS) to detect methylation differences related to a particular phenotype. Since information on the cell-type composition of the sample is generally not available and methylation profiles are cell-type specific, statistical methods have been developed for adjustment of cell-type heterogeneity in EWAS. In this study we systematically compared five popular adjustment methods: the factored spectrally transformed linear mixed model (FaST-LMM-EWASher), the sparse principal component analysis algorithm ReFACTor, surrogate variable analysis (SVA), independent SVA (ISVA) and an optimized version of SVA (SmartSVA). We used real data and applied a multilayered simulation framework to assess the type I error rate, the statistical power and the quality of estimated methylation differences according to major study characteristics. While all five adjustment methods improved false-positive rates compared with unadjusted analyses, FaST-LMM-EWASher resulted in the lowest type I error rate at the expense of low statistical power. SVA efficiently corrected for cell-type heterogeneity in EWAS up to 200 cases and 200 controls, but did not control type I error rates in larger studies. Results based on real data sets confirmed simulation findings with the strongest control of type I error rates by FaST-LMM-EWASher and SmartSVA. Overall, ReFACTor, ISVA and SmartSVA showed the best comparable statistical power, quality of estimated methylation differences and runtime.

Download Full-text

Statistical model specification and power: recommendations on the use of test-qualified pooling in analysis of experimental data

Proceedings of The Royal Society B Biological Sciences ◽

10.1098/rspb.2016.1850 ◽

2017 ◽

Vol 284 (1851) ◽

pp. 20161850 ◽

Cited By ~ 7

Author(s):

Nick Colegrave ◽

Graeme D. Ruxton

Keyword(s):

Experimental Data ◽

Statistical Model ◽

Statistical Power ◽

Error Term ◽

Degrees Of Freedom ◽

Type I Error ◽

Error Rates ◽

Statistical Testing ◽

Model Specification ◽

Type I

A common approach to the analysis of experimental data across much of the biological sciences is test-qualified pooling. Here non-significant terms are dropped from a statistical model, effectively pooling the variation associated with each removed term with the error term used to test hypotheses (or estimate effect sizes). This pooling is only carried out if statistical testing on the basis of applying that data to a previous more complicated model provides motivation for this model simplification; hence the pooling is test-qualified. In pooling, the researcher increases the degrees of freedom of the error term with the aim of increasing statistical power to test their hypotheses of interest. Despite this approach being widely adopted and explicitly recommended by some of the most widely cited statistical textbooks aimed at biologists, here we argue that (except in highly specialized circumstances that we can identify) the hoped-for improvement in statistical power will be small or non-existent, and there is likely to be much reduced reliability of the statistical procedures through deviation of type I error rates from nominal levels. We thus call for greatly reduced use of test-qualified pooling across experimental biology, more careful justification of any use that continues, and a different philosophy for initial selection of statistical models in the light of this change in procedure.

Download Full-text

A Monte Carlo Simulation Study for Kolmogorov-Smirnov Two-Sample Test Under the Precondition of Heterogeneity: Upon the Changes on the Probabilities of Statistical Power and Type I Error Rates with Respect to Skewness Measure

SSRN Electronic Journal ◽

10.2139/ssrn.2497601 ◽

2013 ◽

Author(s):

ttken Senger ◽

Ali Kemal elik

Keyword(s):

Monte Carlo Simulation ◽

Monte Carlo ◽

Statistical Power ◽

Type I Error ◽

Error Rates ◽

Type I ◽

Monte Carlo Simulation Study ◽

Type I Error Rates ◽

Sample Test ◽

Kolmogorov Smirnov

Download Full-text

The effect of number of clusters and cluster size on statistical power and Type I error rates when testing random effects variance components in multilevel linear and logistic regression models

Journal of Statistical Computation and Simulation ◽

10.1080/00949655.2018.1504945 ◽

2018 ◽

Vol 88 (16) ◽

pp. 3151-3163 ◽

Cited By ~ 8

Author(s):

Peter C. Austin ◽

George Leckie

Keyword(s):

Variance Components ◽

Cluster Size ◽

Regression Models ◽

Statistical Power ◽

Type I Error ◽

Error Rates ◽

Type I ◽

Number Of Clusters ◽

Logistic Regression Models ◽

Type I Error Rates

Download Full-text

Bias, Type I Error Rates, and Statistical Power of a Latent Mediation Model in the Presence of Violations of Invariance

Educational and Psychological Measurement ◽

10.1177/0013164416684169 ◽

2017 ◽

Vol 78 (3) ◽

pp. 460-481 ◽

Cited By ~ 3

Author(s):

Margarita Olivera-Aguilar ◽

Samuel H. Rikoon ◽

Oscar Gonzalez ◽

Yasemin Kisbu-Sakarya ◽

David P. MacKinnon

Keyword(s):

Measurement Invariance ◽

Statistical Power ◽

Type I Error ◽

Error Rates ◽

Parameter Estimates ◽

Type I ◽

Mediation Model ◽

Type I Error Rates ◽

Mediated Effects ◽

The Impact

When testing a statistical mediation model, it is assumed that factorial measurement invariance holds for the mediating construct across levels of the independent variable X. The consequences of failing to address the violations of measurement invariance in mediation models are largely unknown. The purpose of the present study was to systematically examine the impact of mediator noninvariance on the Type I error rates, statistical power, and relative bias in parameter estimates of the mediated effect in the single mediator model. The results of a large simulation study indicated that, in general, the mediated effect was robust to violations of invariance in loadings. In contrast, most conditions with violations of intercept invariance exhibited severely positively biased mediated effects, Type I error rates above acceptable levels, and statistical power larger than in the invariant conditions. The implications of these results are discussed and recommendations are offered.

Download Full-text

Type I Error Rates and Statistical Power for the James Second-Order Test and the UnivariateFTest in Two-Way Fixed-Effects ANOVA Models Under Heteroscedasticity and/or Nonnormality

The Journal of Experimental Education ◽

10.1080/00220973.1996.9943463 ◽

1996 ◽

Vol 65 (1) ◽

pp. 57-71 ◽

Cited By ~ 10

Author(s):

Tung-Hsing Hsiung ◽

Stephen Olejnik

Keyword(s):

Fixed Effects ◽

Statistical Power ◽

Type I Error ◽

Error Rates ◽

Second Order ◽

Type I ◽

Type I Error Rates

Download Full-text

The Significance of Power and the Power of Significance: Recommendations for Occupational Therapy Research

The Occupational Therapy Journal of Research ◽

10.1177/153944928400400103 ◽

1984 ◽

Vol 4 (1) ◽

pp. 37-50 ◽

Cited By ~ 2

Author(s):

Kenneth Ottenbacher

Keyword(s):

Occupational Therapy ◽

Statistical Power ◽

Type I Error ◽

Clinical Investigation ◽

Error Rates ◽

High Rate ◽

Type I ◽

Theory Approach ◽

Significance Level ◽

Applied Fields

Research in the behavioral and social sciences including occupational therapy has been shown to be associated with low statistical power and a high rate of Type II experimental errors. Three methods of increasing power that are frequently suggested are increasing sample size, increasing effect size, and increasing the significance level The first two alternatives are often not possible in applied fields such as occupational therapy, and the third is generally not considered desirable since it leads to increased Type I error rates. A fourth alternative is proposed, which involves the partitioning of the decision region into three sections. This procedure is based on the Neyman and Pearson (1933) decision-theory approach to significance testing and is particularly applicable to areas of applied and clinical investigation such as occupational therapy. A sample power table is presented along with formulas to compute table values. The argument is made that using the procedures described will provide a method of unambiguously interpreting nonsignificant results and increase the power and sensitivity of occupational therapy research.

Download Full-text

Comparison of Type I error rates and statistical power of different propensity score methods

Journal of Statistical Computation and Simulation ◽

10.1080/00949655.2017.1406937 ◽

2017 ◽

Vol 88 (4) ◽

pp. 769-784

Author(s):

Falynn C. Turley ◽

David Redden ◽

Janice L. Case ◽

Charles Katholi ◽

Jeff Szychowski ◽

...

Keyword(s):

Propensity Score ◽

Statistical Power ◽

Type I Error ◽

Error Rates ◽

Type I ◽

Type I Error Rates ◽

Propensity Score Methods

Download Full-text