An Efficient Approach to Screening Epigenome-Wide Data

4497 Accessible False Discovery Rate Computation

Journal of Clinical and Translational Science ◽

10.1017/cts.2020.164 ◽

2020 ◽

Vol 4 (s1) ◽

pp. 44-44

Author(s):

Megan C Hollister ◽

Jeffrey D. Blume

Keyword(s):

Multiple Testing ◽

Empirical Bayes ◽

Hypothesis Test ◽

Estimation Methods ◽

P Value ◽

Empirical Distributions ◽

False Discovery ◽

Research Findings ◽

Unknown Mixture ◽

User Friendly

OBJECTIVES/GOALS: To improve the implementation of FDRs in translation research. Current statistical packages are hard to use and fail to adequately convey strong assumptions. We developed a software package that allows the user to decide on assumptions and choose the hey desire. We encourage wider reporting of FDRs for observed findings. METHODS/STUDY POPULATION: We developed a user-friendly R function for computing FDRs from observed p-values. A variety of methods for FDR estimation and for FDR control are included so the user can select the approach most appropriate for their setting. Options include Efron’s Empirical Bayes FDR, Benjamini-Hochberg FDR control for multiple testing, Lindsey’s method for smoothing empirical distributions, estimation of the mixing proportion, and central matching. We illustrate the important difference between estimating the FDR for a particular finding and adjusting a hypothesis test to control the false discovery propensity. RESULTS/ANTICIPATED RESULTS: We performed a comparison of the capabilities of our new p.fdr function to the popular p.adjust function from the base stats-package. Specifically, we examined multiple examples of data coming from different unknown mixture distributions to highlight the null estimation methods p.fdr includes. The base package does not provide the optimal FDR usage nor sufficient estimation options. We also compared the step-up/step-down procedure used in adjusted p-value hypothesis test and discuss when this is inappropriate. The p.adjust function is not able to report raw-adjusted values and this will be shown in the graphical results. DISCUSSION/SIGNIFICANCE OF IMPACT: FDRs reveal the propensity for an observed result to be incorrect. FDRs should accompany observed results to help contextualize the relevance and potential impact of research findings. Our results show that previous methods are not sufficient rich or precise in their calculations. Our new package allows the user to be in control of the null estimation and step-up implementation when reporting FDRs.

Download Full-text

Estimation of False Discovery Rates in Multiple Testing: Application to Gene Microarray Data

Biometrics ◽

10.1111/j.0006-341x.2003.00123.x ◽

2003 ◽

Vol 59 (4) ◽

pp. 1071-1081 ◽

Cited By ~ 102

Author(s):

Chen‐An Tsai ◽

Huey‐miin Hsueh ◽

James J. Chen

Keyword(s):

Microarray Data ◽

Multiple Testing ◽

Gene Microarray ◽

False Discovery Rates ◽

False Discovery ◽

Discovery Rates

Download Full-text

benchmarkR: an R package for benchmarking genome-scale methods

10.1101/018200 ◽

2015 ◽

Author(s):

Xiaobei Zhou ◽

Charity W Law ◽

Mark D Robinson

Keyword(s):

Receiver Operating Characteristic ◽

Statistical Methods ◽

Large Scale ◽

Operating Characteristic ◽

Scale Validation ◽

Roc Curves ◽

R Package ◽

False Discovery Rates ◽

False Discovery ◽

Receiver Operating

benchmarkR is an R package designed to assess and visualize the performance of statistical methods for datasets that have an independent truth (e.g., simulations or datasets with large-scale validation), in particular for methods that claim to control false discovery rates (FDR). We augment some of the standard performance plots (e.g., receiver operating characteristic, or ROC, curves) with information about how well the methods are calibrated (i.e., whether they achieve their expected FDR control). For example, performance plots are extended with a point to highlight the power or FDR at a user-set threshold (e.g., at a method's estimated 5% FDR). The package contains general containers to store simulation results (SimResults) and methods to create graphical summaries, such as receiver operating characteristic curves (rocX), false discovery plots (fdX) and power-to-achieved FDR plots (powerFDR); each plot is augmented with some form of calibration information. We find these plots to be an improved way to interpret relative performance of statistical methods for genomic datasets where many hypothesis tests are performed. The strategies, however, are general and will find applications in other domains.

Download Full-text

False discovery rates and multiple testing

Resonance ◽

10.1007/s12045-013-0137-9 ◽

2013 ◽

Vol 18 (12) ◽

pp. 1095-1109

Author(s):

Soumen Dey ◽

Mohan Delampady

Keyword(s):

Multiple Testing ◽

False Discovery Rates ◽

False Discovery ◽

Discovery Rates

Download Full-text

FDRestimation: Flexible False Discovery Rate Computation in R

F1000Research ◽

10.12688/f1000research.52999.2 ◽

2021 ◽

Vol 10 ◽

pp. 441

Author(s):

Megan H. Murray ◽

Jeffrey D. Blume

Keyword(s):

False Discovery Rate ◽

Estimation Procedure ◽

False Discovery Rates ◽

P Values ◽

False Discovery ◽

Software Packages ◽

Broad Array ◽

Potential Impact ◽

User Friendly ◽

Discovery Rates

False discovery rates (FDR) are an essential component of statistical inference, representing the propensity for an observed result to be mistaken. FDR estimates should accompany observed results to help the user contextualize the relevance and potential impact of findings. This paper introduces a new user-friendly R pack-age for estimating FDRs and computing adjusted p-values for FDR control. The roles of these two quantities are often confused in practice and some software packages even report the adjusted p-values as the estimated FDRs. A key contribution of this package is that it distinguishes between these two quantities while also offering a broad array of refined algorithms for estimating them. For example, included are newly augmented methods for estimating the null proportion of findings - an important part of the FDR estimation procedure. The package is broad, encompassing a variety of adjustment methods for FDR estimation and FDR control, and includes plotting functions for easy display of results. Through extensive illustrations, we strongly encourage wider reporting of false discovery rates for observed findings.

Download Full-text

FDRestimation: Flexible False Discovery Rate Computation in R

F1000Research ◽

10.12688/f1000research.52999.1 ◽

2021 ◽

Vol 10 ◽

pp. 441

Author(s):

Megan H. Murray ◽

Jeffrey D. Blume

Keyword(s):

False Discovery Rate ◽

Estimation Procedure ◽

False Discovery Rates ◽

P Values ◽

False Discovery ◽

Software Packages ◽

Broad Array ◽

Potential Impact ◽

User Friendly ◽

Discovery Rates

False discovery rates (FDR) are an essential component of statistical inference, representing the propensity for an observed result to be mistaken. FDR estimates should accompany observed results to help the user contextualize the relevance and potential impact of findings. This paper introduces a new user-friendly R pack-age for estimating FDRs and computing adjusted p-values for FDR control. The roles of these two quantities are often confused in practice and some software packages even report the adjusted p-values as the estimated FDRs. A key contribution of this package is that it distinguishes between these two quantities while also offering a broad array of refined algorithms for estimating them. For example, included are newly augmented methods for estimating the null proportion of findings - an important part of the FDR estimation procedure. The package is broad, encompassing a variety of adjustment methods for FDR estimation and FDR control, and includes plotting functions for easy display of results. Through extensive illustrations, we strongly encourage wider reporting of false discovery rates for observed findings.

Download Full-text

Controlling two-dimensional false discovery rates by combining two univariate multiple testing results with an application to mass spectral data

Chemometrics and Intelligent Laboratory Systems ◽

10.1016/j.chemolab.2018.09.006 ◽

2018 ◽

Vol 182 ◽

pp. 149-157 ◽

Cited By ~ 1

Author(s):

Youngrae Kim ◽

Johan Lim ◽

Jong Soo Lee ◽

Jaesik Jeong

Keyword(s):

Spectral Data ◽

Multiple Testing ◽

Two Dimensional ◽

Mass Spectral Data ◽

False Discovery Rates ◽

Mass Spectral ◽

False Discovery ◽

Discovery Rates

Download Full-text

False Discovery Rates: A New Deal

10.1101/038216 ◽

2016 ◽

Cited By ~ 4

Author(s):

Matthew Stephens

Keyword(s):

Empirical Bayes ◽

Large Scale ◽

R Package ◽

False Discovery Rates ◽

Interval Estimates ◽

False Discovery ◽

Unobserved Effects ◽

The One ◽

Significance Measures ◽

Discovery Rates

AbstractWe introduce a new Empirical Bayes approach for large-scale hypothesis testing, including estimating False Discovery Rates (FDRs), and effect sizes. This approach has two key differences from existing approaches to FDR analysis. First, it assumes that the distribution of the actual (unobserved) effects is unimodal, with a mode at 0. This “unimodal assumption” (UA), although natural in many contexts, is not usually incorporated into standard FDR analysis, and we demonstrate how incorporating it brings many benefits. Specifically, the UA facilitates efficient and robust computation – estimating the unimodal distribution involves solving a simple convex optimization problem – and enables more accurate inferences provided that it holds. Second, the method takes as its input two numbers for each test (an effect size estimate, and corresponding standard error), rather than the one number usually used (p value, or z score). When available, using two numbers instead of one helps account for variation in measurement precision across tests. It also facilitates estimation of effects, and unlike standard FDR methods our approach provides interval estimates (credible regions) for each effect in addition to measures of significance. To provide a bridge between interval estimates and significance measures we introduce the term “local false sign rate” to refer to the probability of getting the sign of an effect wrong, and argue that it is a superior measure of significance than the local FDR because it is both more generally applicable, and can be more robustly estimated. Our methods are implemented in an R package ashr available from http://github.com/stephens999/ashr.

Download Full-text

Error-Rate and Decision-Theoretic Methods of Multiple Testing: Which Genes Have High Objective Probabilities of Differential Expression?

Statistical Applications in Genetics and Molecular Biology ◽

10.2202/1544-6115.1043 ◽

2004 ◽

Vol 3 (1) ◽

pp. 1-20 ◽

Cited By ~ 8

Author(s):

David R. Bickel

Keyword(s):

Decision Theory ◽

Multiple Testing ◽

Conditional Probabilities ◽

False Discovery Rates ◽

Gene Expression Microarrays ◽

False Discovery ◽

Probability Densities ◽

Expression Microarrays ◽

Discovery Rates ◽

Selection Of

Given a multiple testing situation, the null hypotheses that appear to have sufficiently low probabilities of truth may be rejected using a simple, nonparametric method based on decision theory. This applies not only to posterior levels of belief, but also to conditional probabilities in the sense of relative frequencies, as seen from their equality to local false discovery rates (dFDRs). This approach neither requires the estimation of probability densities, nor of their ratios. Decision theory can also inform the selection of false discovery rate weights. An application to gene expression microarrays is presented with a discussion of the applicability of the assumption of "clumpy dependence."

Download Full-text

methyvim: Targeted, robust, and model-free differential methylation analysis in R

F1000Research ◽

10.12688/f1000research.16047.1 ◽

2018 ◽

Vol 7 ◽

pp. 1424

Author(s):

Nima S. Hejazi ◽

Rachael V. Phillips ◽

Alan E. Hubbard ◽

Mark J. van der Laan

Keyword(s):

Statistical Inference ◽

Multiple Testing ◽

R Package ◽

Differential Methylation ◽

Computationally Efficient ◽

Multiple Testing Correction ◽

Statistical Parameters ◽

Cpg Sites ◽

Model Free ◽

Data Adaptive

We present methyvim, an R package implementing an algorithm for the nonparametric estimation of the effects of exposures on DNA methylation at CpG sites throughout the genome, complete with straightforward statistical inference for such estimates. The approach leverages variable importance measures derived from statistical parameters arising in causal inference, defined in such a manner that they may be used to obtain targeted estimates of the relative importance of individual CpG sites with respect to a binary treatment assigned at the phenotype level, thereby providing a new approach to identifying differentially methylated positions. The procedure implemented is computationally efficient, incorporating a preliminary screening step to isolate a subset of sites for which there is cursory evidence of differential methylation as well as a unique multiple testing correction to control the False Discovery Rate with the same rigor as would be available if all sites were subjected to testing. This novel technique for analysis of differentially methylated positions provides an avenue for incorporating flexible state-of-the-art data-adaptive regression procedures (i.e., machine learning) into the estimation of differential methylation effects without the loss of interpretable statistical inference for the estimated quantity.

Download Full-text