Beyond the traditional simulation design for evaluating type 1 error control: From the “theoretical” null to “empirical” null

AbstractHigh-dimensional hypothesis testing is ubiquitous in the biomedical sciences, and informative covariates may be employed to improve power. The conditional false discovery rate (cFDR) is widely-used approach suited to the setting where the covariate is a set of p-values for the equivalent hypotheses for a second trait. Although related to the Benjamini-Hochberg procedure, it does not permit any easy control of type-1 error rate, and existing methods are over-conservative. We propose a new method for type-1 error rate control based on identifying mappings from the unit square to the unit interval defined by the estimated cFDR, and splitting observations so that each map is independent of the observations it is used to test. We also propose an adjustment to the existing cFDR estimator which further improves power. We show by simulation that the new method more than doubles potential improvement in power over unconditional analyses compared to existing methods. We demonstrate our method on transcriptome-wide association studies, and show that the method can be used in an iterative way, enabling the use of multiple covariates successively. Our methods substantially improve the power and applicability of cFDR analysis.

Download Full-text

Magnitude Based Inference in Relation to One-sided Hypotheses Testing Procedures

10.31236/osf.io/pn9s3 ◽

2020 ◽

Author(s):

Janet Aisbett ◽

Daniel Lakens ◽

Kristin Sainani

Keyword(s):

Error Control ◽

Hypothesis Test ◽

A Priori ◽

Error Rates ◽

Equivalence Testing ◽

High Type ◽

Testing Procedures ◽

Type 1 Error ◽

Sample Size Calculations

Magnitude based inference (MBI) was widely adopted by sport science researchers as an alternative to null hypothesis significance tests. It has been criticized for lacking a theoretical framework, mixing Bayesian and frequentist thinking, and encouraging researchers to run small studies with high Type 1 error rates. MBI terminology describes the position of confidence intervals in relation to smallest meaningful effect sizes. We show these positions correspond to combinations of one-sided tests of hypotheses about the presence or absence of meaningful effects, and formally describe MBI as a multiple decision procedure. MBI terminology operates as if tests are conducted at multiple alpha levels. We illustrate how error rates can be controlled by limiting each one-sided hypothesis test to a single alpha level. To provide transparent error control in a Neyman-Pearson framework and encourage the use of standard statistical software, we recommend replacing MBI with one-sided tests against smallest meaningful effects, or pairs of such tests as in equivalence testing. Researchers should pre-specify their hypotheses and alpha levels, perform a priori sample size calculations, and justify all assumptions. Our recommendations show researchers what tests to use and how to design and report their statistical analyses to accord with standard frequentist practice.

Download Full-text

Permutation-based significance analysis reduces the type 1 error rate in bisulfite sequencing data analysis of human umbilical cord blood samples

10.1101/2021.05.18.444359 ◽

2021 ◽

Author(s):

Essi Laajala ◽

Viivi Halla-aho ◽

Toni Grönroos ◽

Ubaid Ullah ◽

Mari Vähä-Mäkilä ◽

...

Keyword(s):

Cord Blood ◽

Umbilical Cord ◽

Umbilical Cord Blood ◽

Error Control ◽

Bisulfite Sequencing ◽

Differential Methylation ◽

Gestational Weight ◽

Type 1 Error ◽

Analysis Workflow

Background: The aim of this study was to detect differential methylation in umbilical cord blood that is associated with maternal and pregnancy-related variables, such as maternal age and gestational weight gain. These have been studied earlier with 450K microarrays but not with bisulfite sequencing. Methods: Reduced representation bisulfite sequencing (RRBS) analysis was performed on 200 umbilical cord blood samples. Altogether 24 clinical and technical covariates were included in a binomial mixed effects model, which was fit separately for each high-coverage CpG site, followed by spatial and multiple testing adjustment of P values. Inflation of spatially adjusted P values was discovered in a permutation analysis, which was then applied for empirical type 1 error control. Results: Empirical type 1 error control decreased the number of findings associated with each covariate to zero or a small fraction of the number that would have been discovered with standard cutoffs. In this collection of samples, some differential methylation was associated with sex, the usage of epidural anesthetic during delivery, 1 minute Apgar points, maternal age and height, gestational weight gain, maternal smoking, and maternal insulin-treated diabetes, but not with the birth weight of the newborn infant, maternal pre-pregnancy BMI, the number of earlier miscarriages, the mode of delivery, labor induction, or the cosine transformed month of birth. Conclusions: The autocorrelation-adjusted Z-test is a convenient tool for detecting differentially methylated regions, but the significance should either be determined empirically or before the spatial adjustment. With appropriate significance thresholds, the detected differentially methylated regions were reproducible across studies, technologies, and statistical models. Our RRBS data analysis workflow is available in https://github.com/EssiLaajala/RRBS_workflow. Keywords: DNA methylation, bisulfite sequencing, RRBS, umbilical cord blood, pregnancy, sex, spatial correlation, type 1 error, differential methylation, analysis workflow

Download Full-text

Rare variants association testing for a binary outcome when pooling individual level data from heterogeneous studies

10.1101/2020.04.17.047530 ◽

2020 ◽

Author(s):

Tamar Sofer ◽

Na Guo

Keyword(s):

Genetic Variants ◽

Error Control ◽

Rare Variants ◽

Score Test ◽

Saddlepoint Approximation ◽

Rare Allele ◽

Association Testing ◽

Type 1 Error ◽

Rare Genetic Variants

AbstractWhole genome and exome sequencing studies are used to test the association of rare genetic variants with health traits. Many existing WGS efforts now aggregate data from heterogeneous groups, e.g. combining sets of individuals of European and African ancestries. We here investigate the statistical implications on rare variant association testing with a binary trait when combining together heterogeneous studies, defined as studies with potentially different disease proportion and different frequency of variant carriers. We study and compare in simulations the type 1 error control and power of the naïve Score test, the saddlepoint approximation to the score test (SPA test), and the BinomiRare test in a range of settings, focusing on low numbers of variant carriers. We show that type 1 error control and power patterns depend on both the number of carriers of the rare allele and on disease prevalence in each of the studies. We develop recommendations for association analysis of rare genetic variants. (1) The Score test is preferred when the case proportion in the sample is 50%. (2) Do not down-sample controls to balance case-control ratio, because it reduces power. Rather, use a test that controls the type 1 error. (3) Conduct stratified analysis in parallel with combined analysis. Aggregated testing may have lower power when the variant effect size differs between strata.

Download Full-text

Beyond the traditional simulation design for evaluating type 1 error rate: from ‘theoretical’ to ‘empirical’ null

10.1101/311290 ◽

2018 ◽

Author(s):

Ting Zhang ◽

Lei Sun

Keyword(s):

Ratio Test ◽

Simulation Design ◽

Important Distinction ◽

Nominal Level ◽

Association Analyses ◽

Type 1 Error ◽

Genome Association ◽

Whole Genome Association ◽

Control Designs

AbstractWhen evaluating a newly developed statistical test, the first step is to check its type 1 error (TIE) control using simulations. This is often achieved by the standard simulation design S0 under the so-called ‘theoretical’ null of no association. In practice, whole-genome association analyses scan through a large number of genetic markers (Gs) for the ones associated with an outcome of interest (Y), where Y comes from an unknown alternative while the majority of Gs are not associated with Y, that is under the ‘empirical’ null. This reality can be better represented by two other simulation designs, where design S1.1 simulates Y from an alternative model based on G then evaluates its association with independently generated Gnew, while design S1.2 evaluates the association between permutated Yperm and G. More than a decade ago, Efron (2004) has noted the important distinction between the ‘theoretical’ and ‘empirical’ null in false discovery rate control. Using scale tests for variance heterogeneity and location tests of interaction effect as two examples, here we show that not all null simulation designs are equal. In examining the accuracy of a likelihood ratio test, while simulation design S0 shows the method has the correct T1E control, designs S1.1 and S1.2 suggest otherwise with empirical T1E values of 0.07 for the 0.05 nominal level. And the inflation becomes more severe at the tail and does not diminish as sample size increases. This is an important observation that calls for new practices for methods evaluation and interpretation of T1E control.

Download Full-text

Meta-Analysis of ERP Investigations of Pain Empathy underlines methodological issues in ERP research

10.1101/225474 ◽

2018 ◽

Author(s):

Michel-Pierre Coll

Keyword(s):

Error Control ◽

Meta Analysis ◽

Event Related Potential ◽

Selective Reporting ◽

Validity And Reliability ◽

Type 1 Error ◽

Empathy For Pain ◽

Potential Component ◽

Clinical Populations

AbstractEmpathy has received considerable attention from the field of cognitive and social neuroscience. A significant portion of these studies used the event-related potential (ERP) technique to study the mechanisms of empathy for pain in others in different conditions and clinical populations. These show that specific ERP components measured during the observation of pain in others are modulated by several factors and altered in clinical populations. However, issues present in this literature such as analytical flexibility and lack of type 1 error control raise doubts regarding the validity and reliability of these conclusions. The current study compiled the results and methodological characteristics of 40 studies using ERP to study empathy of pain in others. The results of the meta-analysis suggest that the centro-parietal P3 and late positive potential component are sensitive to the observation of pain in others, while the early N1 and N2 components are not reliably associated with vicarious pain observation. The review of the methodological characteristics shows that the presence of selective reporting, analytical flexibility and lack of type 1 error control compromise the interpretation of these results. The implication of these results for the study of empathy and potential solutions to improve future investigations are discussed.

Download Full-text

Coordinate Based Random Effect Size meta-analysis of neuroimaging studies

10.1101/089565 ◽

2016 ◽

Author(s):

CR Tench ◽

Radu Tanasescu ◽

WJ Cottam ◽

CS Constantinescu ◽

DP Auer

Keyword(s):

Effect Size ◽

Error Control ◽

Meta Analysis ◽

Statistical Significance ◽

Random Effect ◽

Simulated Data ◽

Real Data ◽

Type 1 Error ◽

Meta Analyses

1AbstractLow power in neuroimaging studies can make them difficult to interpret, and Coordinate based meta‐ analysis (CBMA) may go some way to mitigating this issue. CBMA has been used in many analyses to detect where published functional MRI or voxel-based morphometry studies testing similar hypotheses report significant summary results (coordinates) consistently. Only the reported coordinates and possibly t statistics are analysed, and statistical significance of clusters is determined by coordinate density.Here a method of performing coordinate based random effect size meta-analysis and meta-regression is introduced. The algorithm (ClusterZ) analyses both coordinates and reported t statistic or Z score, standardised by the number of subjects. Statistical significance is determined not by coordinate density, but by a random effects meta-analyses of reported effects performed cluster-wise using standard statistical methods and taking account of censoring inherent in the published summary results. Type 1 error control is achieved using the false cluster discovery rate (FCDR), which is based on the false discovery rate. This controls both the family wise error rate under the null hypothesis that coordinates are randomly drawn from a standard stereotaxic space, and the proportion of significant clusters that are expected under the null. Such control is vital to avoid propagating and even amplifying the very issues motivating the meta-analysis in the first place. ClusterZ is demonstrated on both numerically simulated data and on real data from reports of grey matter loss in multiple sclerosis (MS) and syndromes suggestive of MS, and of painful stimulus in healthy controls. The software implementation is available to download and use freely.

Download Full-text

Inferring median survival differences in general factorial designs via permutation tests

Statistical Methods in Medical Research ◽

10.1177/0962280220980784 ◽

2020 ◽

pp. 096228022098078

Author(s):

Marc Ditzhaus ◽

Dennis Dobler ◽

Markus Pauly

Keyword(s):

Median Survival ◽

Error Control ◽

Proportional Hazards ◽

Cox Regression ◽

Confidence Regions ◽

Survival Times ◽

Type 1 Error ◽

Hazard Ratios ◽

Survival Differences

Factorial survival designs with right-censored observations are commonly inferred by Cox regression and explained by means of hazard ratios. However, in case of non-proportional hazards, their interpretation can become cumbersome; especially for clinicians. We therefore offer an alternative: median survival times are used to estimate treatment and interaction effects and null hypotheses are formulated in contrasts of their population versions. Permutation-based tests and confidence regions are proposed and shown to be asymptotically valid. Their type-1 error control and power behavior are investigated in extensive simulations, showing the new methods’ wide applicability. The latter is complemented by an illustrative data analysis.

Download Full-text

Empirical Investigation of Type 1 Error Rate of Some Normality Test Statistics

International Journal of Psychosocial Rehabilitation ◽

10.37200/ijpr/v24i4/pr201037 ◽

2020 ◽

Vol 24 (04) ◽

pp. 591-599 ◽

Cited By ~ 1

Author(s):

John O Kuranga ◽

Kayode Ayinde ◽

Gbenga S. Solomon

Keyword(s):

Error Rate ◽

Empirical Investigation ◽

Test Statistics ◽

Type 1 Error ◽

Normality Test

Download Full-text

Accounting for confounding by time, early intervention adoption, and time-varying effect modification in the design and analysis of stepped-wedge designs: application to a proposed study design to reduce opioid-related mortality

BMC Medical Research Methodology ◽

10.1186/s12874-021-01229-6 ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Lior Rennert ◽

Moonseong Heo ◽

Alain H. Litwin ◽

Victor De Gruttola

Keyword(s):

Mixed Effects ◽

Effect Modification ◽

External Factors ◽

Mixed Effects Models ◽

Time Varying ◽

Intervention Effect ◽

Type 1 Error ◽

And Control ◽

Related Mortality

Abstract Background Beginning in 2019, stepped-wedge designs (SWDs) were being used in the investigation of interventions to reduce opioid-related deaths in communities across the United States. However, these interventions are competing with external factors such as newly initiated public policies limiting opioid prescriptions, media awareness campaigns, and the COVID-19 pandemic. Furthermore, control communities may prematurely adopt components of the intervention as they become available. The presence of time-varying external factors that impact study outcomes is a well-known limitation of SWDs; common approaches to adjusting for them make use of a mixed effects modeling framework. However, these models have several shortcomings when external factors differentially impact intervention and control clusters. Methods We discuss limitations of commonly used mixed effects models in the context of proposed SWDs to investigate interventions intended to reduce opioid-related mortality, and propose extensions of these models to address these limitations. We conduct an extensive simulation study of anticipated data from SWD trials targeting the current opioid epidemic in order to examine the performance of these models in the presence of external factors. We consider confounding by time, premature adoption of intervention components, and time-varying effect modification— in which external factors differentially impact intervention and control clusters. Results In the presence of confounding by time, commonly used mixed effects models yield unbiased intervention effect estimates, but can have inflated Type 1 error and result in under coverage of confidence intervals. These models yield biased intervention effect estimates when premature intervention adoption or effect modification are present. In such scenarios, models incorporating fixed intervention-by-time interactions with an unstructured covariance for intervention-by-cluster-by-time random effects result in unbiased intervention effect estimates, reach nominal confidence interval coverage, and preserve Type 1 error. Conclusions Mixed effects models can adjust for different combinations of external factors through correct specification of fixed and random time effects. Since model choice has considerable impact on validity of results and study power, careful consideration must be given to how these external factors impact study endpoints and what estimands are most appropriate in the presence of such factors.

Download Full-text