Improved consistency in estimates of conditional false discovery rates increases power relative to both existing methods and parametric estimators
AbstractA common aim in high-dimensional association studies is the identification of the subset of investigated variables associated with a trait of interest. Using association statistics on the same variables for a second related trait can improve power. An important quantity in such analyses is the conditional false-discovery rate (cFDR), the probability of non-association with the trait of interest given p-value thresholds for both traits. The cFDR can be used for hypothesis testing and as a posterior probability in its own right. In this paper, we propose new estimators for the cFDR based on kernel density estimates and mixture-Gaussian models of effect sizes, the latter also allowing estimation of a ‘local’ form of cFDR (cfdr). We also propose a general non-parametric improvement to existing estimators based on estimating a posterior probability previously estimated at 1. We find that new estimators have the desirable property of smooth rejection regions, but, unexpectedly, do not improve the power of the method, even when distributional assumptions are true. Furthermore, we find that although the local cfdr represents a theoretically optimal decision boundary, noisiness in its estimation means it is less powerful than corresponding cFDR estimates. We find, however, that the non-parametric adjustment increases power for every estimator. We demonstrate the best method on transcriptome-wide association study datasets for breast and ovarian cancers. The findings from this analysis are of both theoretical and pragmatic interest, giving insight into the nature of cFDR and the behaviour of false-discovery rates in a two-dimensional setting. Our methods allow improved control over the behaviour of the cFDR estimator and improved power in high-dimensional hypothesis testing.