scholarly journals An Efficient Approach to Screening Epigenome-Wide Data

2016 ◽  
Vol 2016 ◽  
pp. 1-16 ◽  
Author(s):  
Meredith A. Ray ◽  
Xin Tong ◽  
Gabrielle A. Lockett ◽  
Hongmei Zhang ◽  
Wilfried J. J. Karmaus

Screening cytosine-phosphate-guanine dinucleotide (CpG) DNA methylation sites in association with some covariate(s) is desired due to high dimensionality. We incorporate surrogate variable analyses (SVAs) into (ordinary or robust) linear regressions and utilize training and testing samples for nested validation to screen CpG sites. SVA is to account for variations in the methylation not explained by the specified covariate(s) and adjust for confounding effects. To make it easier to users, this screening method is built into a user-friendly R package,ttScreening, with efficient algorithms implemented. Various simulations were implemented to examine the robustness and sensitivity of the method compared to the classical approaches controlling for multiple testing: the false discovery rates-based (FDR-based) and the Bonferroni-based methods. The proposed approach in general performs better and has the potential to control both types I and II errors. We appliedttScreeningto 383,998 CpG sites in association with maternal smoking, one of the leading factors for cancer risk.

2020 ◽  
Vol 4 (s1) ◽  
pp. 44-44
Author(s):  
Megan C Hollister ◽  
Jeffrey D. Blume

OBJECTIVES/GOALS: To improve the implementation of FDRs in translation research. Current statistical packages are hard to use and fail to adequately convey strong assumptions. We developed a software package that allows the user to decide on assumptions and choose the hey desire. We encourage wider reporting of FDRs for observed findings. METHODS/STUDY POPULATION: We developed a user-friendly R function for computing FDRs from observed p-values. A variety of methods for FDR estimation and for FDR control are included so the user can select the approach most appropriate for their setting. Options include Efron’s Empirical Bayes FDR, Benjamini-Hochberg FDR control for multiple testing, Lindsey’s method for smoothing empirical distributions, estimation of the mixing proportion, and central matching. We illustrate the important difference between estimating the FDR for a particular finding and adjusting a hypothesis test to control the false discovery propensity. RESULTS/ANTICIPATED RESULTS: We performed a comparison of the capabilities of our new p.fdr function to the popular p.adjust function from the base stats-package. Specifically, we examined multiple examples of data coming from different unknown mixture distributions to highlight the null estimation methods p.fdr includes. The base package does not provide the optimal FDR usage nor sufficient estimation options. We also compared the step-up/step-down procedure used in adjusted p-value hypothesis test and discuss when this is inappropriate. The p.adjust function is not able to report raw-adjusted values and this will be shown in the graphical results. DISCUSSION/SIGNIFICANCE OF IMPACT: FDRs reveal the propensity for an observed result to be incorrect. FDRs should accompany observed results to help contextualize the relevance and potential impact of research findings. Our results show that previous methods are not sufficient rich or precise in their calculations. Our new package allows the user to be in control of the null estimation and step-up implementation when reporting FDRs.


2015 ◽  
Author(s):  
Xiaobei Zhou ◽  
Charity W Law ◽  
Mark D Robinson

benchmarkR is an R package designed to assess and visualize the performance of statistical methods for datasets that have an independent truth (e.g., simulations or datasets with large-scale validation), in particular for methods that claim to control false discovery rates (FDR). We augment some of the standard performance plots (e.g., receiver operating characteristic, or ROC, curves) with information about how well the methods are calibrated (i.e., whether they achieve their expected FDR control). For example, performance plots are extended with a point to highlight the power or FDR at a user-set threshold (e.g., at a method's estimated 5% FDR). The package contains general containers to store simulation results (SimResults) and methods to create graphical summaries, such as receiver operating characteristic curves (rocX), false discovery plots (fdX) and power-to-achieved FDR plots (powerFDR); each plot is augmented with some form of calibration information. We find these plots to be an improved way to interpret relative performance of statistical methods for genomic datasets where many hypothesis tests are performed. The strategies, however, are general and will find applications in other domains.


Resonance ◽  
2013 ◽  
Vol 18 (12) ◽  
pp. 1095-1109
Author(s):  
Soumen Dey ◽  
Mohan Delampady

F1000Research ◽  
2021 ◽  
Vol 10 ◽  
pp. 441
Author(s):  
Megan H. Murray ◽  
Jeffrey D. Blume

False discovery rates (FDR) are an essential component of statistical inference, representing the propensity for an observed result to be mistaken. FDR estimates should accompany observed results to help the user contextualize the relevance and potential impact of findings. This paper introduces a new user-friendly R pack-age for estimating FDRs and computing adjusted p-values for FDR control. The roles of these two quantities are often confused in practice and some software packages even report the adjusted p-values as the estimated FDRs. A key contribution of this package is that it distinguishes between these two quantities while also offering a broad array of refined algorithms for estimating them. For example, included are newly augmented methods for estimating the null proportion of findings - an important part of the FDR estimation procedure. The package is broad, encompassing a variety of adjustment methods for FDR estimation and FDR control, and includes plotting functions for easy display of results. Through extensive illustrations, we strongly encourage wider reporting of false discovery rates for observed findings.


F1000Research ◽  
2021 ◽  
Vol 10 ◽  
pp. 441
Author(s):  
Megan H. Murray ◽  
Jeffrey D. Blume

False discovery rates (FDR) are an essential component of statistical inference, representing the propensity for an observed result to be mistaken. FDR estimates should accompany observed results to help the user contextualize the relevance and potential impact of findings. This paper introduces a new user-friendly R pack-age for estimating FDRs and computing adjusted p-values for FDR control. The roles of these two quantities are often confused in practice and some software packages even report the adjusted p-values as the estimated FDRs. A key contribution of this package is that it distinguishes between these two quantities while also offering a broad array of refined algorithms for estimating them. For example, included are newly augmented methods for estimating the null proportion of findings - an important part of the FDR estimation procedure. The package is broad, encompassing a variety of adjustment methods for FDR estimation and FDR control, and includes plotting functions for easy display of results. Through extensive illustrations, we strongly encourage wider reporting of false discovery rates for observed findings.


2016 ◽  
Author(s):  
Matthew Stephens

AbstractWe introduce a new Empirical Bayes approach for large-scale hypothesis testing, including estimating False Discovery Rates (FDRs), and effect sizes. This approach has two key differences from existing approaches to FDR analysis. First, it assumes that the distribution of the actual (unobserved) effects is unimodal, with a mode at 0. This “unimodal assumption” (UA), although natural in many contexts, is not usually incorporated into standard FDR analysis, and we demonstrate how incorporating it brings many benefits. Specifically, the UA facilitates efficient and robust computation – estimating the unimodal distribution involves solving a simple convex optimization problem – and enables more accurate inferences provided that it holds. Second, the method takes as its input two numbers for each test (an effect size estimate, and corresponding standard error), rather than the one number usually used (p value, or z score). When available, using two numbers instead of one helps account for variation in measurement precision across tests. It also facilitates estimation of effects, and unlike standard FDR methods our approach provides interval estimates (credible regions) for each effect in addition to measures of significance. To provide a bridge between interval estimates and significance measures we introduce the term “local false sign rate” to refer to the probability of getting the sign of an effect wrong, and argue that it is a superior measure of significance than the local FDR because it is both more generally applicable, and can be more robustly estimated. Our methods are implemented in an R package ashr available from http://github.com/stephens999/ashr.


2004 ◽  
Vol 3 (1) ◽  
pp. 1-20 ◽  
Author(s):  
David R. Bickel

Given a multiple testing situation, the null hypotheses that appear to have sufficiently low probabilities of truth may be rejected using a simple, nonparametric method based on decision theory. This applies not only to posterior levels of belief, but also to conditional probabilities in the sense of relative frequencies, as seen from their equality to local false discovery rates (dFDRs). This approach neither requires the estimation of probability densities, nor of their ratios. Decision theory can also inform the selection of false discovery rate weights. An application to gene expression microarrays is presented with a discussion of the applicability of the assumption of "clumpy dependence."


F1000Research ◽  
2018 ◽  
Vol 7 ◽  
pp. 1424
Author(s):  
Nima S. Hejazi ◽  
Rachael V. Phillips ◽  
Alan E. Hubbard ◽  
Mark J. van der Laan

We present methyvim, an R package implementing an algorithm for the nonparametric estimation of the effects of exposures on DNA methylation at CpG sites throughout the genome, complete with straightforward statistical inference for such estimates. The approach leverages variable importance measures derived from statistical parameters arising in causal inference, defined in such a manner that they may be used to obtain targeted estimates of the relative importance of individual CpG sites with respect to a binary treatment assigned at the phenotype level, thereby providing a new approach to identifying differentially methylated positions. The procedure implemented is computationally efficient, incorporating a preliminary screening step to isolate a subset of sites for which there is cursory evidence of differential methylation as well as a unique multiple testing correction to control the False Discovery Rate with the same rigor as would be available if all sites were subjected to testing. This novel technique for analysis of differentially methylated positions provides an avenue for incorporating flexible state-of-the-art data-adaptive regression procedures (i.e., machine learning) into the estimation of differential methylation effects without the loss of interpretable statistical inference for the estimated quantity.


Sign in / Sign up

Export Citation Format

Share Document