scholarly journals Accurate error control in high dimensional association testing using conditional false discovery rates

2018 ◽  
Author(s):  
James Liley ◽  
Chris Wallace

AbstractHigh-dimensional hypothesis testing is ubiquitous in the biomedical sciences, and informative covariates may be employed to improve power. The conditional false discovery rate (cFDR) is widely-used approach suited to the setting where the covariate is a set of p-values for the equivalent hypotheses for a second trait. Although related to the Benjamini-Hochberg procedure, it does not permit any easy control of type-1 error rate, and existing methods are over-conservative. We propose a new method for type-1 error rate control based on identifying mappings from the unit square to the unit interval defined by the estimated cFDR, and splitting observations so that each map is independent of the observations it is used to test. We also propose an adjustment to the existing cFDR estimator which further improves power. We show by simulation that the new method more than doubles potential improvement in power over unconditional analyses compared to existing methods. We demonstrate our method on transcriptome-wide association studies, and show that the method can be used in an iterative way, enabling the use of multiple covariates successively. Our methods substantially improve the power and applicability of cFDR analysis.

2014 ◽  
Vol 56 (4) ◽  
pp. 614-630 ◽  
Author(s):  
Alexandra C. Graf ◽  
Peter Bauer ◽  
Ekkehard Glimm ◽  
Franz Koenig

2016 ◽  
Vol 148 (8) ◽  
pp. 24-31
Author(s):  
Kayode Ayinde ◽  
John Olatunde ◽  
Gbenga Sunday

2008 ◽  
Vol 27 (3) ◽  
pp. 371-381 ◽  
Author(s):  
Steven Snapinn ◽  
Qi Jiang
Keyword(s):  

1986 ◽  
Vol 20 (2) ◽  
pp. 189-200 ◽  
Author(s):  
Kevin D. Bird ◽  
Wayne Hall

Statistical power is neglected in much psychiatric research, with the consequence that many studies do not provide a reasonable chance of detecting differences between groups if they exist in the population. This paper attempts to improve current practice by providing an introduction to the essential quantities required for performing a power analysis (sample size, effect size, type 1 and type 2 error rates). We provide simplified tables for estimating the sample size required to detect a specified size of effect with a type 1 error rate of α and a type 2 error rate of β, and for estimating the power provided by a given sample size for detecting a specified size of effect with a type 1 error rate of α. We show how to modify these tables to perform power analyses for multiple comparisons in univariate and some multivariate designs. Power analyses for each of these types of design are illustrated by examples.


2018 ◽  
Vol 28 (6) ◽  
pp. 1879-1892 ◽  
Author(s):  
Alexandra Christine Graf ◽  
Gernot Wassmer ◽  
Tim Friede ◽  
Roland Gerard Gera ◽  
Martin Posch

With the advent of personalized medicine, clinical trials studying treatment effects in subpopulations are receiving increasing attention. The objectives of such studies are, besides demonstrating a treatment effect in the overall population, to identify subpopulations, based on biomarkers, where the treatment has a beneficial effect. Continuous biomarkers are often dichotomized using a threshold to define two subpopulations with low and high biomarker levels. If there is insufficient information on the dependence structure of the outcome on the biomarker, several thresholds may be investigated. The nested structure of such subpopulations is similar to the structure in group sequential trials. Therefore, it has been proposed to use the corresponding critical boundaries to test such nested subpopulations. We show that for biomarkers with a prognostic effect that is not adjusted for in the statistical model, the variability of the outcome may vary across subpopulations which may lead to an inflation of the family-wise type 1 error rate. Using simulations we quantify the potential inflation of testing procedures based on group sequential designs. Furthermore, alternative hypotheses tests that control the family-wise type 1 error rate under minimal assumptions are proposed. The methodological approaches are illustrated by a trial in depression.


2020 ◽  
Author(s):  
Janet Aisbett ◽  
Daniel Lakens ◽  
Kristin Sainani

Magnitude based inference (MBI) was widely adopted by sport science researchers as an alternative to null hypothesis significance tests. It has been criticized for lacking a theoretical framework, mixing Bayesian and frequentist thinking, and encouraging researchers to run small studies with high Type 1 error rates. MBI terminology describes the position of confidence intervals in relation to smallest meaningful effect sizes. We show these positions correspond to combinations of one-sided tests of hypotheses about the presence or absence of meaningful effects, and formally describe MBI as a multiple decision procedure. MBI terminology operates as if tests are conducted at multiple alpha levels. We illustrate how error rates can be controlled by limiting each one-sided hypothesis test to a single alpha level. To provide transparent error control in a Neyman-Pearson framework and encourage the use of standard statistical software, we recommend replacing MBI with one-sided tests against smallest meaningful effects, or pairs of such tests as in equivalence testing. Researchers should pre-specify their hypotheses and alpha levels, perform a priori sample size calculations, and justify all assumptions. Our recommendations show researchers what tests to use and how to design and report their statistical analyses to accord with standard frequentist practice.


2021 ◽  
Author(s):  
Shing Wan Choi ◽  
Timothy Shin Heng Mak ◽  
Clive J. Hoggart ◽  
Paul F. O'Reilly

Background: Polygenic risk score (PRS) analyses are now routinely applied in biomedical research, with great hope that they will aid in our understanding of disease aetiology and contribute to personalized medicine. The continued growth of multi-cohort genome-wide association studies (GWASs) and large-scale biobank projects has provided researchers with a wealth of GWAS summary statistics and individual-level data suitable for performing PRS analyses. However, as the size of these studies increase, the risk of inter-cohort sample overlap and close relatedness increases. Ideally sample overlap would be identified and removed directly, but this is typically not possible due to privacy laws or consent agreements. This sample overlap, whether known or not, is a major problem in PRS analyses because it can lead to inflation of type 1 error and, thus, erroneous conclusions in published work. Results: Here, for the first time, we report the scale of the sample overlap problem for PRS analyses by generating known sample overlap across sub-samples of the UK Biobank data, which we then use to produce GWAS and target data to mimic the effects of inter-cohort sample overlap. We demonstrate that inter-cohort overlap results in a significant and often substantial inflation in the observed PRS-trait association, coefficient of determination (R2) and false-positive rate. This inflation can be high even when the absolute number of overlapping individuals is small if this makes up a notable fraction of the target sample. We develop and introduce EraSOR (Erase Sample Overlap and Relatedness), a software for adjusting inflation in PRS prediction and association statistics in the presence of sample overlap or close relatedness between the GWAS and target samples. A key component of the EraSOR approach is inference of the degree of sample overlap from the intercept of a bivariate LD score regression applied to the GWAS and target data, making it powered in settings where both have sample sizes over 1,000 individuals. Through extensive benchmarking using UK Biobank and HapGen2 simulated genotype-phenotype data, we demonstrate that PRSs calculated using EraSOR-adjusted GWAS summary statistics are robust to inter-cohort overlap in a wide range of realistic scenarios and are even robust to high levels of residual genetic and environmental stratification. Conclusion: The results of all PRS analyses for which sample overlap cannot be definitively ruled out should be considered with caution given high type 1 error observed in the presence of even low overlap between base and target cohorts. Given the strong performance of EraSOR in eliminating inflation caused by sample overlap in PRS studies with large (>5k) target samples, we recommend that EraSOR be used in all future such PRS studies to mitigate the potential effects of inter-cohort overlap and close relatedness.


Sign in / Sign up

Export Citation Format

Share Document