4497 Accessible False Discovery Rate Computation

Megan C Hollister; Jeffrey D. Blume

doi:10.1017/cts.2020.164

4497 Accessible False Discovery Rate Computation

Journal of Clinical and Translational Science ◽

10.1017/cts.2020.164 ◽

2020 ◽

Vol 4 (s1) ◽

pp. 44-44

Author(s):

Megan C Hollister ◽

Jeffrey D. Blume

Keyword(s):

Multiple Testing ◽

Empirical Bayes ◽

Hypothesis Test ◽

Estimation Methods ◽

P Value ◽

Empirical Distributions ◽

False Discovery ◽

Research Findings ◽

Unknown Mixture ◽

User Friendly

OBJECTIVES/GOALS: To improve the implementation of FDRs in translation research. Current statistical packages are hard to use and fail to adequately convey strong assumptions. We developed a software package that allows the user to decide on assumptions and choose the hey desire. We encourage wider reporting of FDRs for observed findings. METHODS/STUDY POPULATION: We developed a user-friendly R function for computing FDRs from observed p-values. A variety of methods for FDR estimation and for FDR control are included so the user can select the approach most appropriate for their setting. Options include Efron’s Empirical Bayes FDR, Benjamini-Hochberg FDR control for multiple testing, Lindsey’s method for smoothing empirical distributions, estimation of the mixing proportion, and central matching. We illustrate the important difference between estimating the FDR for a particular finding and adjusting a hypothesis test to control the false discovery propensity. RESULTS/ANTICIPATED RESULTS: We performed a comparison of the capabilities of our new p.fdr function to the popular p.adjust function from the base stats-package. Specifically, we examined multiple examples of data coming from different unknown mixture distributions to highlight the null estimation methods p.fdr includes. The base package does not provide the optimal FDR usage nor sufficient estimation options. We also compared the step-up/step-down procedure used in adjusted p-value hypothesis test and discuss when this is inappropriate. The p.adjust function is not able to report raw-adjusted values and this will be shown in the graphical results. DISCUSSION/SIGNIFICANCE OF IMPACT: FDRs reveal the propensity for an observed result to be incorrect. FDRs should accompany observed results to help contextualize the relevance and potential impact of research findings. Our results show that previous methods are not sufficient rich or precise in their calculations. Our new package allows the user to be in control of the null estimation and step-up implementation when reporting FDRs.

Download Full-text

Resampling-Based Empirical Bayes Multiple Testing Procedures for Controlling Generalized Tail Probability and Expected Value Error Rates: Focus on the False Discovery Rate and Simulation Study

Biometrical Journal ◽

10.1002/bimj.200710473 ◽

2008 ◽

Vol 50 (5) ◽

pp. 716-744 ◽

Cited By ~ 12

Author(s):

Sandrine Dudoit ◽

Houston N. Gilbert ◽

Mark J. van der Laan

Keyword(s):

False Discovery Rate ◽

Simulation Study ◽

Multiple Testing ◽

Empirical Bayes ◽

Tail Probability ◽

Error Rates ◽

Expected Value ◽

Testing Procedures ◽

False Discovery ◽

Multiple Testing Procedures

Download Full-text

Influence of multiple hypothesis testing on reproducibility in neuroimaging research

10.1101/488353 ◽

2018 ◽

Author(s):

Tuomas Puoliväli ◽

Satu Palva ◽

J. Matias Palva

Keyword(s):

Hypothesis Testing ◽

Multiple Testing ◽

Multiple Hypothesis Testing ◽

Permutation Testing ◽

Random Field Theory ◽

False Discovery ◽

Multiple Hypothesis ◽

Simultaneous Testing ◽

Research Findings

AbstractBackgroundReproducibility of research findings has been recently questioned in many fields of science, including psychology and neurosciences. One factor influencing reproducibility is the simultaneous testing of multiple hypotheses, which increases the number of false positive findings unless the p-values are carefully corrected. While this multiple testing problem is well known and has been studied for decades, it continues to be both a theoretical and practical problem.New MethodHere we assess the reproducibility of research involving multiple-testing corrected for family-wise error rate (FWER) or false discovery rate (FDR) by techniques based on random field theory (RFT), cluster-mass based permutation testing, adaptive FDR, and several classical methods. We also investigate the performance of these methods under two different models.ResultsWe found that permutation testing is the most powerful method among the considered approaches to multiple testing, and that grouping hypotheses based on prior knowledge can improve power. We also found that emphasizing primary and follow-up studies equally produced most reproducible outcomes.Comparison with Existing Method(s)We have extended the use of two-group and separate-classes models for analyzing reproducibility and provide a new open-source software “MultiPy” for multiple hypothesis testing.ConclusionsOur results suggest that performing strict corrections for multiple testing is not sufficient to improve reproducibility of neuroimaging experiments. The methods are freely available as a Python toolkit “MultiPy” and we aim this study to help in improving statistical data analysis practices and to assist in conducting power and reproducibility analyses for new experiments.

Download Full-text

An Empirical Bayes Optimal Discovery Procedure Based on Semiparametric Hierarchical Mixture Models

Computational and Mathematical Methods in Medicine ◽

10.1155/2013/568480 ◽

2013 ◽

Vol 2013 ◽

pp. 1-9

Author(s):

Hisashi Noma ◽

Shigeyuki Matsui

Keyword(s):

Mixture Model ◽

Multiple Testing ◽

Empirical Bayes ◽

Gene Selection ◽

Fixed Number ◽

Test Statistic ◽

Microarray Experiments ◽

False Discovery ◽

Genome Wide ◽

Optimal Discovery Procedure

Multiple testing has been widely adopted for genome-wide studies such as microarray experiments. For effective gene selection in these genome-wide studies, the optimal discovery procedure (ODP), which maximizes the number of expected true positives for each fixed number of expected false positives, was developed as a multiple testing extension of the most powerful test for a single hypothesis by Storey (Journal of the Royal Statistical Society, Series B,vol. 69, no. 3, pp. 347–368, 2007). In this paper, we develop an empirical Bayes method for implementing the ODP based on a semiparametric hierarchical mixture model using the “smoothing-by-roughening" approach. Under the semiparametric hierarchical mixture model, (i) the prior distribution can be modeled flexibly, (ii) the ODP test statistic and the posterior distribution are analytically tractable, and (iii) computations are easy to implement. In addition, we provide a significance rule based on the false discovery rate (FDR) in the empirical Bayes framework. Applications to two clinical studies are presented.

Download Full-text

Significance estimation for large scale untargeted metabolomics annotations

10.1101/109389 ◽

2017 ◽

Cited By ~ 4

Author(s):

Kerstin Scheubert ◽

Franziska Hufsky ◽

Daniel Petras ◽

Mingxun Wang ◽

Louis-Félix Nothias ◽

...

Keyword(s):

Small Molecules ◽

Empirical Bayes ◽

Large Scale ◽

Estimation Methods ◽

Scale Analysis ◽

Reference Library ◽

False Discovery Rates ◽

False Discovery ◽

Large Scale Analysis ◽

Discovery Rates

AbstractThe annotation of small molecules in untargeted mass spectrometry relies on the matching of fragment spectra to reference library spectra. While various spectrum-spectrum match scores exist, the field lacks statistical methods for estimating the false discovery rates (FDR) of these annotations. We present empirical Bayes and target-decoy based methods to estimate the false discovery rate. Relying on estimations of false discovery rates, we explore the effect of different spectrum-spectrum match criteria on the number and the nature of the molecules annotated. We show that the spectral matching settings needs to be adjusted for each project. By adjusting the scoring parameters and thresholds, the number of annotations rose, on average, by +139% (ranging from −92% up to +5705%) when compared to a default parameter set available at GNPS. The FDR estimation methods presented will enable a user to define the scoring criteria for large scale analysis of untargeted small molecule data that has been essential in the advancement of large scale proteomics, transcriptomics, and genomics science.

Download Full-text

Differentially private false discovery rate control

Journal of Privacy and Confidentiality ◽

10.29012/jpc.755 ◽

2021 ◽

Vol 11 (2) ◽

Author(s):

Cynthia Dwork ◽

Weijie Su ◽

Li Zhang

Keyword(s):

False Discovery Rate ◽

Multiple Testing ◽

Differential Privacy ◽

Broad Class ◽

Multiple Hypothesis Testing ◽

P Value ◽

P Values ◽

Testing Procedures ◽

False Discovery ◽

Rigorous Framework

Differential privacy provides a rigorous framework for privacy-preserving data analysis. This paper proposes the first differentially private procedure for controlling the false discovery rate (FDR) in multiple hypothesis testing. Inspired by the Benjamini-Hochberg procedure (BHq), our approach is to first repeatedly add noise to the logarithms of the p-values to ensure differential privacy and to select an approximately smallest p-value serving as a promising candidate at each iteration; the selected p-values are further supplied to the BHq and our private procedure releases only the rejected ones. Moreover, we develop a new technique that is based on a backward submartingale for proving FDR control of a broad class of multiple testing procedures, including our private procedure, and both the BHq step- up and step-down procedures. As a novel aspect, the proof works for arbitrary dependence between the true null and false null test statistics, while FDR control is maintained up to a small multiplicative factor.

Download Full-text

An Efficient Approach to Screening Epigenome-Wide Data

BioMed Research International ◽

10.1155/2016/2615348 ◽

2016 ◽

Vol 2016 ◽

pp. 1-16 ◽

Cited By ~ 9

Author(s):

Meredith A. Ray ◽

Xin Tong ◽

Gabrielle A. Lockett ◽

Hongmei Zhang ◽

Wilfried J. J. Karmaus

Keyword(s):

Multiple Testing ◽

Screening Method ◽

R Package ◽

False Discovery Rates ◽

Cpg Dna ◽

Cpg Sites ◽

Surrogate Variable ◽

False Discovery ◽

User Friendly ◽

Linear Regressions

Screening cytosine-phosphate-guanine dinucleotide (CpG) DNA methylation sites in association with some covariate(s) is desired due to high dimensionality. We incorporate surrogate variable analyses (SVAs) into (ordinary or robust) linear regressions and utilize training and testing samples for nested validation to screen CpG sites. SVA is to account for variations in the methylation not explained by the specified covariate(s) and adjust for confounding effects. To make it easier to users, this screening method is built into a user-friendly R package,ttScreening, with efficient algorithms implemented. Various simulations were implemented to examine the robustness and sensitivity of the method compared to the classical approaches controlling for multiple testing: the false discovery rates-based (FDR-based) and the Bonferroni-based methods. The proposed approach in general performs better and has the potential to control both types I and II errors. We appliedttScreeningto 383,998 CpG sites in association with maternal smoking, one of the leading factors for cancer risk.

Download Full-text

A Causal Web between Chronotype and Metabolic Health Traits

Genes ◽

10.3390/genes12071029 ◽

2021 ◽

Vol 12 (7) ◽

pp. 1029

Author(s):

John A. Williams ◽

Dominic Russ ◽

Laura Bravo-Merodio ◽

Victor Roth Cardoso ◽

Samantha C. Pendleton ◽

...

Keyword(s):

Multiple Testing ◽

Alcohol Intake ◽

Mendelian Randomization ◽

P Value ◽

Multiple Testing Correction ◽

New Associations ◽

False Discovery ◽

Confounding Variables ◽

Intermediate Variables ◽

Trait Associations

Observational and experimental evidence has linked chronotype to both psychological and cardiometabolic traits. Recent Mendelian randomization (MR) studies have investigated direct links between chronotype and several of these traits, often in isolation of outside potential mediating or moderating traits. We mined the EpiGraphDB MR database for calculated chronotype–trait associations (p-value < 5 × 10−8). We then re-analyzed those relevant to metabolic or mental health and investigated for statistical evidence of horizontal pleiotropy. Analyses passing multiple testing correction were then investigated for confounders, colliders, intermediates, and reverse intermediates using the EpiGraphDB database, creating multiple chronotype–trait interactions among each of the the traits studied. We revealed 10 significant chronotype–exposure associations (false discovery rate < 0.05) exposed to 111 potential previously known confounders, 52 intermediates, 18 reverse intermediates, and 31 colliders. Chronotype–lipid causal associations collided with treatment and diabetes effects; chronotype–bipolar associations were mediated by breast cancer; and chronotype–alcohol intake associations were impacted by confounders and intermediate variables including known zeitgebers and molecular traits. We have reported the influence of chronotype on several cardiometabolic and behavioural traits, and identified potential confounding variables not reported on in studies while discovering new associations to drugs and disease.

Download Full-text

Sebaran Peluang Acak Kontinu, Distribusi Normal, Distribusi Normal Baku, Distribusi T, Distribusi Chi Square, dan Distribusi F

10.31219/osf.io/grdnm ◽

2020 ◽

Author(s):

Ahmad Sudi Pratikno

Keyword(s):

Hypothesis Test ◽

Standard Normal Distribution ◽

Process Data ◽

Nominal Data ◽

Chi Square ◽

Stable Curve ◽

Standard Normal ◽

Research Findings ◽

F Distribution ◽

T Distribution

In statistics, there are various terms that may feel unfamiliar to researcher who is not accustomed to discussing it. However, despite all of many functions and benefits that we can get as researchers to process data, it will later be interpreted into a conclusion. And then researcher can digest and understand the research findings. The distribution of continuous random opportunities illustrates obtaining opportunities with some detection of time, weather, and other data obtained from the field. The standard normal distribution represents a stable curve with zero mean and standard deviation 1, while the t distribution is used as a statistical test in the hypothesis test. Chi square deals with the comparative test on two variables with a nominal data scale, while the f distribution is often used in the ANOVA test and regression analysis.

Download Full-text

2dFDR: a new approach to confounder adjustment substantially increases detection power in omics association studies

Genome Biology ◽

10.1186/s13059-021-02418-8 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Sangyoon Yi ◽

Xianyang Zhang ◽

Lu Yang ◽

Jinyan Huang ◽

Yuanhang Liu ◽

...

Keyword(s):

Multiple Testing ◽

Statistical Power ◽

Association Studies ◽

Control Procedure ◽

Multiple Testing Correction ◽

New Approach ◽

False Discovery ◽

Traditional Procedure ◽

Extensive Evaluation ◽

Confounder Adjustment

AbstractOne challenge facing omics association studies is the loss of statistical power when adjusting for confounders and multiple testing. The traditional statistical procedure involves fitting a confounder-adjusted regression model for each omics feature, followed by multiple testing correction. Here we show that the traditional procedure is not optimal and present a new approach, 2dFDR, a two-dimensional false discovery rate control procedure, for powerful confounder adjustment in multiple testing. Through extensive evaluation, we demonstrate that 2dFDR is more powerful than the traditional procedure, and in the presence of strong confounding and weak signals, the power improvement could be more than 100%.

Download Full-text

An empirical Bayes mixture method for effect size and false discovery rate estimation

The Annals of Applied Statistics ◽

10.1214/09-aoas276 ◽

2010 ◽

Vol 4 (1) ◽

pp. 422-438 ◽

Cited By ~ 42

Author(s):

Omkar Muralidharan

Keyword(s):

False Discovery Rate ◽

Effect Size ◽

Empirical Bayes ◽

Rate Estimation ◽

False Discovery ◽

False Discovery Rate Estimation

Download Full-text