scholarly journals A direct approach to estimating false discovery rates conditional on covariates

2015 ◽  
Author(s):  
Simina M. Boca ◽  
Jeffrey T. Leek

AbstractModern scientific studies from many diverse areas of research abound with multiple hypothesis testing concerns. The false discovery rate is one of the most commonly used error rates for measuring and controlling rates of false discoveries when performing multiple tests. Adaptive false discovery rates rely on an estimate of the proportion of null hypotheses among all the hypotheses being tested. This proportion is typically estimated once for each collection of hypotheses. Here we propose a regression framework to estimate the proportion of null hypotheses conditional on observed covariates. This may then be used as a multiplication factor with the Benjamini-Hochberg adjusted p-values, leading to a plug-in false discovery rate estimator. Our case study concerns a genome-wise association meta-analysis which considers associations with body mass index. In our framework, we are able to use the sample sizes for the individual genomic loci and the minor allele frequencies as covariates. We further evaluate our approach via a number of simulation scenarios.

PeerJ ◽  
2018 ◽  
Vol 6 ◽  
pp. e6035 ◽  
Author(s):  
Simina M. Boca ◽  
Jeffrey T. Leek

Modern scientific studies from many diverse areas of research abound with multiple hypothesis testing concerns. The false discovery rate (FDR) is one of the most commonly used approaches for measuring and controlling error rates when performing multiple tests. Adaptive FDRs rely on an estimate of the proportion of null hypotheses among all the hypotheses being tested. This proportion is typically estimated once for each collection of hypotheses. Here, we propose a regression framework to estimate the proportion of null hypotheses conditional on observed covariates. This may then be used as a multiplication factor with the Benjamini–Hochberg adjusted p-values, leading to a plug-in FDR estimator. We apply our method to a genome-wise association meta-analysis for body mass index. In our framework, we are able to use the sample sizes for the individual genomic loci and the minor allele frequencies as covariates. We further evaluate our approach via a number of simulation scenarios. We provide an implementation of this novel method for estimating the proportion of null hypotheses in a regression framework as part of the Bioconductor package swfdr.


2017 ◽  
Author(s):  
Xiongzhi Chen ◽  
David G. Robinson ◽  
John D. Storey

AbstractThe false discovery rate measures the proportion of false discoveries among a set of hypothesis tests called significant. This quantity is typically estimated based on p-values or test statistics. In some scenarios, there is additional information available that may be used to more accurately estimate the false discovery rate. We develop a new framework for formulating and estimating false discovery rates and q-values when an additional piece of information, which we call an “informative variable”, is available. For a given test, the informative variable provides information about the prior probability a null hypothesis is true or the power of that particular test. The false discovery rate is then treated as a function of this informative variable. We consider two applications in genomics. Our first is a genetics of gene expression (eQTL) experiment in yeast where every genetic marker and gene expression trait pair are tested for associations. The informative variable in this case is the distance between each genetic marker and gene. Our second application is to detect differentially expressed genes in an RNA-seq study carried out in mice. The informative variable in this study is the per-gene read depth. The framework we develop is quite general, and it should be useful in a broad range of scientific applications.


Blood ◽  
2011 ◽  
Vol 118 (21) ◽  
pp. 1238-1238
Author(s):  
Anita D'Souza ◽  
Sebastian M. Armasu ◽  
Mariza de Andrade ◽  
John A. Heit

Abstract Abstract 1238 Background: SNPs within genes encoding factor XI (F11), fibrinogen genes (FGA, FGG) and other candidate genes within the procoagulant, anticoagulant, fibrinolytic, innate immunity and endocrine pathways have been reported as associated with VTE. However, the independent risk of VTE associated with many of these SNPs after controlling for factor V Leiden, Prothrombin G20210A and ABO blood group non-O carrier status is uncertain. Objective: To replicate candidate gene SNPs previously reported as associated with VTE. Methods: As part of a large replication study, we included 17 SNPs previously reported as associated with VTE in a custom Illumina Golden gate (total n=1093 SNPs) genotyping array. We genotyped 1270 non-Hispanic adults of European ancestry with objectively-diagnosed VTE (cases; no cancer, venous catheter or antiphospholipid antibodies) and 1302 controls (frequency-matched on case age, gender, race, MI/stroke status). Genotyping results from high-quality control DNA (SNP call rate ≥ 95%) was used to generate a cluster algorithm. The primary outcome was VTE status, a binary measure. The covariates were age at interview or blood sample collection, sex, stroke and/or MI status, and state of residence. To adjust for population stratification, we performed the multidimensional scaling (MDS) analysis option in PLINK v 1.07 to identify outliers in our population using the ancestry informative markers. We tested for an association between each SNP and VTE using unconditional logistic regression, adjusting for age, sex, stroke/MI status, state of residence and ABO rs514659 (in high linkage disequilibrium with non-O blood type). The analyses were corrected for multiple comparisons using an extension of false discovery rates. The false discovery rate (reported as a Q-value) is an analogue measure of the p-value that takes into account the number of statistical tests and estimates the expected proportion of false positive tests incurred when a particular SNP is significant. All analyses were performed using PLINK v 1.07. Results: MDS gave no evidence of population stratification. Genotyping was unsuccessful for two of the 17 SNPs. We found significant associations between VTE and SNPs in F11, FGG, TC2D and FGA (Table). However, the false discovery rates for all significant SNPs except F11 rs3756008 were >0.05, suggesting that the observed associations were likely falsely positive due to multiple comparisons. Even at a false discovery rate of Q-value=0.0099, one would expect ∼13 SNPs (0.0099 × 1302 SNPs) to be falsely associated with VTE due to multiple comparisons. Consequently, even our observed association between F11 rs3756008 and VTE remains tentative. Conclusions: We were unable to replicate reported associations between 15 SNPs and VTE. Our results emphasize the necessity of replication studies in different populations to confirm reported associations of SNPs with VTE. Disclosures: Heit: Daiichi Sankyo: Consultancy, Honoraria.


2018 ◽  
Author(s):  
LM Hall ◽  
AE Hendricks

AbstractBackgroundRecently, there has been increasing concern about the replicability, or lack thereof, of published research. An especially high rate of false discoveries has been reported in some areas motivating the creation of resource-intensive collaborations to estimate the replication rate of published research by repeating a large number of studies. The substantial amount of resources required by these replication projects limits the number of studies that can be repeated and consequently the generalizability of the findings.Methods and findingsIn 2013, Jager and Leek developed a method to estimate the empirical false discovery rate from journal abstracts and applied their method to five high profile journals. Here, we use the relative efficiency of Jager and Leek’s method to gather p-values from over 30,000 abstracts and to subsequently estimate the false discovery rate for 94 journals over a five-year time span. We model the empirical false discovery rate by journal subject area (cancer or general medicine), impact factor, and Open Access status. We find that the empirical false discovery rate is higher for cancer vs. general medicine journals (p = 5.14E-6). Within cancer journals, we find that this relationship is further modified by journal impact factor where a lower journal impact factor is associated with a higher empirical false discovery rates (p = 0.012, 95% CI: -0.010, -0.001). We find no significant differences, on average, in the false discovery rate for Open Access vs closed access journals (p = 0.256, 95% CI: -0.014, 0.051).ConclusionsWe find evidence of a higher false discovery rate in cancer journals compared to general medicine journals, especially those with a lower journal impact factor. For cancer journals, a lower journal impact factor of one point is associated with a 0.006 increase in the empirical false discovery rate, on average. For a false discovery rate of 0.05, this would result in over a 10% increase to 0.056. Conversely, we find no significant evidence of a higher false discovery rate, on average, for Open Access vs. closed access journals from InCites. Our results provide identify areas of research that may need of additional scrutiny and support to facilitate replicable science. Given our publicly available R code and data, others can complete a broad assessment of the empirical false discovery rate across other subject areas and characteristics of published research.


2021 ◽  
Vol 2 (2) ◽  
pp. p1
Author(s):  
Kirk Davis ◽  
Rodney Maiden

Although the limitations of null hypothesis significance testing (NHST) are well documented in the psychology literature, the accuracy paradox, which concisely states an important limitation of published research, is never mentioned. The accuracy paradox appears when a test with higher accuracy does a poorer job of correctly classifying a particular outcome than a test with lower accuracy, which suggests that a reliance on accuracy as a metric for a test’s usefulness is not always the best metric. Since accuracy is a function of type I and II error rates, it can be misleading to interpret a study’s results as accurate simply because these errors are minimized. Once a decision has been made regarding statistical significance, type I and II error rates are not directly informative to the reader. Instead, false discovery and false omission rates are more informative when evaluating the results of a study. Given the prevalence of publication bias and small effect sizes in the literature, the possibility of a false discovery is especially important to consider. When false discovery rates are estimated, it is easy to understand why many studies in psychology cannot be replicated.


F1000Research ◽  
2021 ◽  
Vol 10 ◽  
pp. 441
Author(s):  
Megan H. Murray ◽  
Jeffrey D. Blume

False discovery rates (FDR) are an essential component of statistical inference, representing the propensity for an observed result to be mistaken. FDR estimates should accompany observed results to help the user contextualize the relevance and potential impact of findings. This paper introduces a new user-friendly R pack-age for estimating FDRs and computing adjusted p-values for FDR control. The roles of these two quantities are often confused in practice and some software packages even report the adjusted p-values as the estimated FDRs. A key contribution of this package is that it distinguishes between these two quantities while also offering a broad array of refined algorithms for estimating them. For example, included are newly augmented methods for estimating the null proportion of findings - an important part of the FDR estimation procedure. The package is broad, encompassing a variety of adjustment methods for FDR estimation and FDR control, and includes plotting functions for easy display of results. Through extensive illustrations, we strongly encourage wider reporting of false discovery rates for observed findings.


F1000Research ◽  
2021 ◽  
Vol 10 ◽  
pp. 441
Author(s):  
Megan H. Murray ◽  
Jeffrey D. Blume

False discovery rates (FDR) are an essential component of statistical inference, representing the propensity for an observed result to be mistaken. FDR estimates should accompany observed results to help the user contextualize the relevance and potential impact of findings. This paper introduces a new user-friendly R pack-age for estimating FDRs and computing adjusted p-values for FDR control. The roles of these two quantities are often confused in practice and some software packages even report the adjusted p-values as the estimated FDRs. A key contribution of this package is that it distinguishes between these two quantities while also offering a broad array of refined algorithms for estimating them. For example, included are newly augmented methods for estimating the null proportion of findings - an important part of the FDR estimation procedure. The package is broad, encompassing a variety of adjustment methods for FDR estimation and FDR control, and includes plotting functions for easy display of results. Through extensive illustrations, we strongly encourage wider reporting of false discovery rates for observed findings.


2019 ◽  
Vol 21 (Supplement_3) ◽  
pp. iii71-iii71
Author(s):  
T Kaisman-Elbaz ◽  
Y Elbaz ◽  
V Merkin ◽  
L Dym ◽  
A Noy ◽  
...  

Abstract BACKGROUND Glioblastoma is known for its dismal prognosis though its dependency on patients’ readily available RBCs parameters defining the patient’s anemic status such as hemoglobin level and Red blood cells distribution Width (RDW) is not fully established. Several works demonstrated a connection between low hemoglobin level or high RDW values to overall glioblastoma patient’s survival, but in other works, a clear connection was not found. This study addresses this unclarity. MATERIAL AND METHODS In this work, 170 glioblastoma patients, diagnosed and treated in Soroka University Medical Center (SUMC) in the last 12 years were retrospectively inspected for their survival dependency on pre-operative RBCs parameters using multivariate analysis followed by false discovery rate procedure due to the multiple hypothesis testing. A survival stratification tree and Kaplan-Meier survival curves that indicate the patient’s prognosis according to these parameters were prepared. RESULTS Beside KPS>70 and tumor resection supplemented by oncological treatment, age<70 (HR=0.4, 95% CI 0.24–0.65), low hemoglobin level (HR=1.79, 95% CI 1.06–2.99) and RDW<14% (HR=0.57, 95% CI 0.37–0.88) were found to be prognostic to patients’ overall survival in multivariate analysis, accounting for false discovery rate of less than 5%. CONCLUSION A survival stratification highlighted a non-anemic subgroup of nearly 30% of the cohort’s patients whose median overall survival was 21.1 months (95% CI 16.2–27.2) - higher than the average Stupp protocol overall median survival of about 15 months. A discussion on the beneficial or detrimental effect of RBCs parameters on glioblastoma prognosis and its possible causes is given.


Genetics ◽  
2002 ◽  
Vol 161 (2) ◽  
pp. 905-914 ◽  
Author(s):  
Hakkyo Lee ◽  
Jack C M Dekkers ◽  
M Soller ◽  
Massoud Malek ◽  
Rohan L Fernando ◽  
...  

Abstract Controlling the false discovery rate (FDR) has been proposed as an alternative to controlling the genomewise error rate (GWER) for detecting quantitative trait loci (QTL) in genome scans. The objective here was to implement FDR in the context of regression interval mapping for multiple traits. Data on five traits from an F2 swine breed cross were used. FDR was implemented using tests at every 1 cM (FDR1) and using tests with the highest test statistic for each marker interval (FDRm). For the latter, a method was developed to predict comparison-wise error rates. At low error rates, FDR1 behaved erratically; FDRm was more stable but gave similar significance thresholds and number of QTL detected. At the same error rate, methods to control FDR gave less stringent significance thresholds and more QTL detected than methods to control GWER. Although testing across traits had limited impact on FDR, single-trait testing was recommended because there is no theoretical reason to pool tests across traits for FDR. FDR based on FDRm was recommended for QTL detection in interval mapping because it provides significance tests that are meaningful, yet not overly stringent, such that a more complete picture of QTL is revealed.


Sign in / Sign up

Export Citation Format

Share Document