Peak p-values and false discovery rate inference in neuroimaging

Abstract Motivation Chromatin Immunopreciptation (ChIP)-seq is used extensively to identify sites of transcription factor binding or regions of epigenetic modifications to the genome. A key step in ChIP-seq analysis is peak calling, where genomic regions enriched for ChIP versus control reads are identified. Many programs have been designed to solve this task, but nearly all fall into the statistical trap of using the data twice—once to determine candidate enriched regions, and again to assess enrichment by classical statistical hypothesis testing. This double use of the data invalidates the statistical significance assigned to enriched regions, thus the true significance or reliability of peak calls remains unknown. Results Using simulated and real ChIP-seq data, we show that three well-known peak callers, MACS, SICER and diffReps, output biased P-values and false discovery rate estimates that can be many orders of magnitude too optimistic. We propose a wrapper algorithm, RECAP, that uses resampling of ChIP-seq and control data to estimate a monotone transform correcting for biases built into peak calling algorithms. When applied to null hypothesis data, where there is no enrichment between ChIP-seq and control, P-values recalibrated by RECAP are approximately uniformly distributed. On data where there is genuine enrichment, RECAP P-values give a better estimate of the true statistical significance of candidate peaks and better false discovery rate estimates, which correlate better with empirical reproducibility. RECAP is a powerful new tool for assessing the true statistical significance of ChIP-seq peak calls. Availability and implementation The RECAP software is available through www.perkinslab.ca or on github at https://github.com/theodorejperkins/RECAP. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

PEAK DETECTION IN MASS SPECTROMETRY BY GABOR FILTERS AND ENVELOPE ANALYSIS

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720009004229 ◽

2009 ◽

Vol 07 (03) ◽

pp. 547-569 ◽

Cited By ~ 10

Author(s):

NHA NGUYEN ◽

HENG HUANG ◽

SOONTORN ORAINTARA ◽

AN VO

Keyword(s):

Mass Spectrometry ◽

False Discovery Rate ◽

Gabor Filter ◽

Signal To Noise Ratio ◽

Peak Detection ◽

Envelope Analysis ◽

Local Maxima ◽

False Discovery ◽

Lower False Discovery Rate ◽

True Position

Mass Spectrometry (MS) is increasingly being used to discover diseases-related proteomic patterns. The peak detection step is one of the most important steps in the typical analysis of MS data. Recently, many new algorithms have been proposed to increase true position rate with low false discovery rate in peak detection. Most of them follow two approaches: one is the denoising approach and the other is the decomposing approach. In the previous studies, the decomposition of MS data method shows more potential than the first one. In this paper, we propose two novel methods, named GaborLocal and GaborEnvelop, both of which can detect more true peaks with a lower false discovery rate than previous methods. We employ the method of Gaussian local maxima to detect peaks, because it is robust to noise in signals. A new approach, peak rank, is defined for the first time to identify peaks instead of using the signal-to-noise ratio. Meanwhile, the Gabor filter is used to amplify important information and compress noise in the raw MS signal. Moreover, we also propose the envelope analysis to improve the quantification of peaks and remove more false peaks. The proposed methods have been performed on the real SELDI-TOF spectrum with known polypeptide positions. The experimental results demonstrate that our methods outperform other commonly used methods in the Receiver Operating Characteristic (ROC) curve.

Download Full-text

FDRestimation: Flexible False Discovery Rate Computation in R

F1000Research ◽

10.12688/f1000research.52999.2 ◽

2021 ◽

Vol 10 ◽

pp. 441

Author(s):

Megan H. Murray ◽

Jeffrey D. Blume

Keyword(s):

False Discovery Rate ◽

Estimation Procedure ◽

False Discovery Rates ◽

P Values ◽

False Discovery ◽

Software Packages ◽

Broad Array ◽

Potential Impact ◽

User Friendly ◽

Discovery Rates

False discovery rates (FDR) are an essential component of statistical inference, representing the propensity for an observed result to be mistaken. FDR estimates should accompany observed results to help the user contextualize the relevance and potential impact of findings. This paper introduces a new user-friendly R pack-age for estimating FDRs and computing adjusted p-values for FDR control. The roles of these two quantities are often confused in practice and some software packages even report the adjusted p-values as the estimated FDRs. A key contribution of this package is that it distinguishes between these two quantities while also offering a broad array of refined algorithms for estimating them. For example, included are newly augmented methods for estimating the null proportion of findings - an important part of the FDR estimation procedure. The package is broad, encompassing a variety of adjustment methods for FDR estimation and FDR control, and includes plotting functions for easy display of results. Through extensive illustrations, we strongly encourage wider reporting of false discovery rates for observed findings.

Download Full-text

FDRestimation: Flexible False Discovery Rate Computation in R

F1000Research ◽

10.12688/f1000research.52999.1 ◽

2021 ◽

Vol 10 ◽

pp. 441

Author(s):

Megan H. Murray ◽

Jeffrey D. Blume

Keyword(s):

False Discovery Rate ◽

Estimation Procedure ◽

False Discovery Rates ◽

P Values ◽

False Discovery ◽

Software Packages ◽

Broad Array ◽

Potential Impact ◽

User Friendly ◽

Discovery Rates

False discovery rates (FDR) are an essential component of statistical inference, representing the propensity for an observed result to be mistaken. FDR estimates should accompany observed results to help the user contextualize the relevance and potential impact of findings. This paper introduces a new user-friendly R pack-age for estimating FDRs and computing adjusted p-values for FDR control. The roles of these two quantities are often confused in practice and some software packages even report the adjusted p-values as the estimated FDRs. A key contribution of this package is that it distinguishes between these two quantities while also offering a broad array of refined algorithms for estimating them. For example, included are newly augmented methods for estimating the null proportion of findings - an important part of the FDR estimation procedure. The package is broad, encompassing a variety of adjustment methods for FDR estimation and FDR control, and includes plotting functions for easy display of results. Through extensive illustrations, we strongly encourage wider reporting of false discovery rates for observed findings.

Download Full-text

Faculty Opinions recommendation of An investigation of the false discovery rate and the misinterpretation of p-values.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.725432010.793514702 ◽

2016 ◽

Author(s):

Robert Sterner

Keyword(s):

False Discovery Rate ◽

P Values ◽

False Discovery

Download Full-text