A practical guide to methods controlling false discoveries in computational biology

Mapping Intimacies ◽

10.1101/458786 ◽

2018 ◽

Cited By ~ 1

Author(s):

Keegan Korthauer ◽

Patrick K Kimes ◽

Claire Duvallet ◽

Alejandro Reyes ◽

Ayshwarya Subramanian ◽

...

Keyword(s):

Computational Biology ◽

Rate Control ◽

Ease Of Use ◽

Simulation Studies ◽

Complementary Information ◽

Practical Guide ◽

False Discovery ◽

Modern Methods ◽

Error Rate Control ◽

False Discoveries

AbstractBackgroundIn high-throughput studies, hundreds to millions of hypotheses are typically tested. Statistical methods that control the false discovery rate (FDR) have emerged as popular and powerful tools for error rate control. While classic FDR methods use only p-values as input, more modern FDR methods have been shown to increase power by incorporating complementary information as “informative covariates” to prioritize, weight, and group hypotheses. However, there is currently no consensus on how the modern methods compare to one another. We investigated the accuracy, applicability, and ease of use of two classic and six modern FDR-controlling methods by performing a systematic benchmark comparison using simulation studies as well as six case studies in computational biologyResultsMethods that incorporate informative covariates were modestly more powerful than classic approaches, and did not underperform classic approaches, even when the covariate was completely uninformative. The majority of methods were successful at controlling the FDR, with the exception of two modern methods under certain settings. Furthermore, we found the improvement of the modern FDR methods over the classic methods increased with the informativeness of the covariate, total number of hypothesis tests, and proportion of truly non-null hypotheses.ConclusionsModern FDR methods that use an informative covariate provide advantages over classic FDR-controlling procedures, with the relative gain dependent on the application and informativeness of available covariates. We present our findings as a practical guide and provide recommendations to aid researchers in their choice of methods to correct for false discoveries.

Download Full-text

A practical guide to methods controlling false discoveries in computational biology

Genome Biology ◽

10.1186/s13059-019-1716-1 ◽

2019 ◽

Vol 20 (1) ◽

Cited By ~ 14

Author(s):

Keegan Korthauer ◽

Patrick K. Kimes ◽

Claire Duvallet ◽

Alejandro Reyes ◽

Ayshwarya Subramanian ◽

...

Keyword(s):

Computational Biology ◽

Practical Guide ◽

False Discoveries

Download Full-text

Aggregating Knockoffs for False Discovery Rate Control with an Application to Gut Microbiome Data

Entropy ◽

10.3390/e23020230 ◽

2021 ◽

Vol 23 (2) ◽

pp. 230

Author(s):

Fang Xie ◽

Johannes Lederer

Keyword(s):

Microbial Diversity ◽

Rate Control ◽

Gut Microbiome ◽

High Dimensional ◽

Health And Wellbeing ◽

False Discovery ◽

Recent Approach ◽

Lasso Estimator ◽

Microbiome Data ◽

False Discoveries

Recent discoveries suggest that our gut microbiome plays an important role in our health and wellbeing. However, the gut microbiome data are intricate; for example, the microbial diversity in the gut makes the data high-dimensional. While there are dedicated high-dimensional methods, such as the lasso estimator, they always come with the risk of false discoveries. Knockoffs are a recent approach to control the number of false discoveries. In this paper, we show that knockoffs can be aggregated to increase power while retaining sharp control over the false discoveries. We support our method both in theory and simulations, and we show that it can lead to new discoveries on microbiome data from the American Gut Project. In particular, our results indicate that several phyla that have been overlooked so far are associated with obesity.

Download Full-text

Interpretation of ‘Omics dynamics in a single subject using local estimates of dispersion between two transcriptomes

10.1101/405332 ◽

2018 ◽

Cited By ~ 4

Author(s):

Qike Li ◽

Samir Rachid Zaim ◽

Dillon Aberasturi ◽

Joanne Berghout ◽

Haiquan Li ◽

...

Keyword(s):

False Discovery Rate ◽

Rna Sequencing ◽

Mixture Model ◽

Rate Control ◽

Empirical Bayes ◽

Single Subject ◽

Local False Discovery Rate ◽

Simulation Studies ◽

Extensive Simulation ◽

False Discovery

AbstractCalculating Differentially Expressed Genes (DEGs) from RNA-sequencing requires replicates to estimate gene-wise variability, infeasible in clinics. By imposing restrictive transcriptome-wide assumptions limiting inferential opportunities of conventional methods (edgeR, NOISeq-sim, DESeq, DEGseq), comparing two conditions without replicates (TCWR) has been proposed, but not evaluated. Under TCWR conditions (e.g., unaffected tissue vs. tumor), differences of transformed expression of the proposed individualized DEG (iDEG) method follow a distribution calculated across a local partition of related transcripts at baseline expression; thereafter the probability of each DEG is estimated by empirical Bayes with local false discovery rate control using a two-group mixture model. In extensive simulation studies of TCWR methods, iDEG and NOISeq are more accurate at 5%<DEGs<20% (precision>90%, recall>75%, false_positive_rate<1%) and 30%<DEGs<40% (precision=recall∼90%), respectively.The proposed iDEG method borrows localized distribution information from the same individual, a strategy that improves accuracy to compare transcriptomes in absence of replicates at low DEGs conditions. http://www.lussiergroup.org/publications/iDEG

Download Full-text

False Discovery Rate Control Under General Dependence By Symmetrized Data Aggregation

Journal of the American Statistical Association ◽

10.1080/01621459.2021.1945459 ◽

2021 ◽

pp. 1-34

Author(s):

Lilun Du ◽

Xu Guo ◽

Wenguang Sun ◽

Changliang Zou

Keyword(s):

False Discovery Rate ◽

Rate Control ◽

Data Aggregation ◽

False Discovery Rate Control ◽

False Discovery

Download Full-text

Error Rate Control in Humidity and Temperature Sensors Using IoT and ThingSpeak

Proceedings of the 2020 10th International Conference on Information Communication and Management ◽

10.1145/3418981.3418993 ◽

2020 ◽

Author(s):

Ayman Wazwaz ◽

Khalid Amin

Keyword(s):

Error Rate ◽

Rate Control ◽

Temperature Sensors ◽

Error Rate Control

Download Full-text

Multiple Attribute Control Charts with False Discovery Rate Control

Quality and Reliability Engineering International ◽

10.1002/qre.1276 ◽

2011 ◽

Vol 28 (8) ◽

pp. 857-871 ◽

Cited By ~ 13

Author(s):

Yanting Li ◽

Fugee Tsung

Keyword(s):

False Discovery Rate ◽

Rate Control ◽

Control Charts ◽

False Discovery Rate Control ◽

False Discovery ◽

Multiple Attribute

Download Full-text

Phase transition and regularized bootstrap in large-scale $t$-tests with false discovery rate control

The Annals of Statistics ◽

10.1214/14-aos1249 ◽

2014 ◽

Vol 42 (5) ◽

pp. 2003-2025 ◽

Cited By ~ 9

Author(s):

Weidong Liu ◽

Qi-Man Shao

Keyword(s):

Phase Transition ◽

False Discovery Rate ◽

Rate Control ◽

Large Scale ◽

False Discovery Rate Control ◽

False Discovery

Download Full-text

Testing Jumps via False Discovery Rate Control

SSRN Electronic Journal ◽

10.2139/ssrn.1586281 ◽

2011 ◽

Author(s):

Yu-Min Yen

Keyword(s):

False Discovery Rate ◽

Rate Control ◽

False Discovery Rate Control ◽

False Discovery

Download Full-text

The Functional False Discovery Rate with Applications to Genomics

10.1101/241133 ◽

2017 ◽

Cited By ~ 2

Author(s):

Xiongzhi Chen ◽

David G. Robinson ◽

John D. Storey

Keyword(s):

Gene Expression ◽

False Discovery Rate ◽

Genetic Marker ◽

Read Depth ◽

False Discovery Rates ◽

Additional Information ◽

False Discovery ◽

Gene Expression Trait ◽

Genetics Of Gene Expression ◽

False Discoveries

AbstractThe false discovery rate measures the proportion of false discoveries among a set of hypothesis tests called significant. This quantity is typically estimated based on p-values or test statistics. In some scenarios, there is additional information available that may be used to more accurately estimate the false discovery rate. We develop a new framework for formulating and estimating false discovery rates and q-values when an additional piece of information, which we call an “informative variable”, is available. For a given test, the informative variable provides information about the prior probability a null hypothesis is true or the power of that particular test. The false discovery rate is then treated as a function of this informative variable. We consider two applications in genomics. Our first is a genetics of gene expression (eQTL) experiment in yeast where every genetic marker and gene expression trait pair are tested for associations. The informative variable in this case is the distance between each genetic marker and gene. Our second application is to detect differentially expressed genes in an RNA-seq study carried out in mice. The informative variable in this study is the per-gene read depth. The framework we develop is quite general, and it should be useful in a broad range of scientific applications.

Download Full-text

False Discovery Rate Control

Brain Mapping ◽

10.1016/b978-0-12-397025-1.00323-7 ◽

2015 ◽

pp. 501-507 ◽

Cited By ~ 1

Author(s):

C.R. Genovese

Keyword(s):

False Discovery Rate ◽

Rate Control ◽

False Discovery Rate Control ◽

False Discovery

Download Full-text