scholarly journals Improved consistency in estimates of conditional false discovery rates increases power relative to both existing methods and parametric estimators

2018 ◽  
Author(s):  
James Liley ◽  
Chris Wallace

AbstractA common aim in high-dimensional association studies is the identification of the subset of investigated variables associated with a trait of interest. Using association statistics on the same variables for a second related trait can improve power. An important quantity in such analyses is the conditional false-discovery rate (cFDR), the probability of non-association with the trait of interest given p-value thresholds for both traits. The cFDR can be used for hypothesis testing and as a posterior probability in its own right. In this paper, we propose new estimators for the cFDR based on kernel density estimates and mixture-Gaussian models of effect sizes, the latter also allowing estimation of a ‘local’ form of cFDR (cfdr). We also propose a general non-parametric improvement to existing estimators based on estimating a posterior probability previously estimated at 1. We find that new estimators have the desirable property of smooth rejection regions, but, unexpectedly, do not improve the power of the method, even when distributional assumptions are true. Furthermore, we find that although the local cfdr represents a theoretically optimal decision boundary, noisiness in its estimation means it is less powerful than corresponding cFDR estimates. We find, however, that the non-parametric adjustment increases power for every estimator. We demonstrate the best method on transcriptome-wide association study datasets for breast and ovarian cancers. The findings from this analysis are of both theoretical and pragmatic interest, giving insight into the nature of cFDR and the behaviour of false-discovery rates in a two-dimensional setting. Our methods allow improved control over the behaviour of the cFDR estimator and improved power in high-dimensional hypothesis testing.

Biometrika ◽  
2011 ◽  
Vol 98 (2) ◽  
pp. 251-271 ◽  
Author(s):  
Bradley Efron ◽  
Nancy R. Zhang

Scientifica ◽  
2012 ◽  
Vol 2012 ◽  
pp. 1-9 ◽  
Author(s):  
Emily Hansen ◽  
Kathleen F. Kerr

The goal of many microarray studies is to identify genes that are differentially expressed between two classes or populations. Many data analysts choose to estimate the false discovery rate (FDR) associated with the list of genes declared differentially expressed. Estimating an FDR largely reduces to estimatingπ1, the proportion of differentially expressed genes among all analyzed genes. Estimatingπ1is usually done throughP-values, but computingP-values can be viewed as a nuisance and potentially problematic step. We evaluated methods for estimatingπ1directly from test statistics, circumventing the need to computeP-values. We adapted existing methodology for estimatingπ1fromt- andz-statistics so thatπ1could be estimated from other statistics. We compared the quality of these estimates to estimates generated by two established methods for estimatingπ1fromP-values. Overall, methods varied widely in bias and variability. The least biased and least variable estimates ofπ1, the proportion of differentially expressed genes, were produced by applying the “convest” mixture model method toP-values computed from a pooled permutation null distribution. Estimates computed directly from test statistics rather thanP-values did not reliably perform well.


Author(s):  
Balthasar Bickel

Large-scale areal patterns point to ancient population history and form a well-known confound for language universals. Despite their importance, demonstrating such patterns remains a challenge. This chapter argues that large-scale area hypotheses are better tested by modeling diachronic family biases than by controlling for genealogical relations in regression models. A case study of the Trans-Pacific area reveals that diachronic bias estimates do not depend much on the amount of phylogenetic information that is used when inferring them. After controlling for false discovery rates, about 39 variables in WALS and AUTOTYP show diachronic biases that differ significantly inside vs. outside the Trans-Pacific area. Nearly three times as many biases hold outside than inside the Trans-Pacific area, indicating that the Trans-Pacific area is not so much characterized by the spread of biases but rather by the retention of earlier diversity, in line with earlier suggestions in the literature.


PROTEOMICS ◽  
2009 ◽  
Vol 9 (5) ◽  
pp. 1220-1229 ◽  
Author(s):  
Andrew R. Jones ◽  
Jennifer A. Siepen ◽  
Simon J. Hubbard ◽  
Norman W. Paton

Sign in / Sign up

Export Citation Format

Share Document