An Empirical Bayes Optimal Discovery Procedure Based on Semiparametric Hierarchical Mixture Models

Computational and Mathematical Methods in Medicine ◽

10.1155/2013/568480 ◽

2013 ◽

Vol 2013 ◽

pp. 1-9

Author(s):

Hisashi Noma ◽

Shigeyuki Matsui

Keyword(s):

Mixture Model ◽

Multiple Testing ◽

Empirical Bayes ◽

Gene Selection ◽

Fixed Number ◽

Test Statistic ◽

Microarray Experiments ◽

False Discovery ◽

Genome Wide ◽

Optimal Discovery Procedure

Multiple testing has been widely adopted for genome-wide studies such as microarray experiments. For effective gene selection in these genome-wide studies, the optimal discovery procedure (ODP), which maximizes the number of expected true positives for each fixed number of expected false positives, was developed as a multiple testing extension of the most powerful test for a single hypothesis by Storey (Journal of the Royal Statistical Society, Series B,vol. 69, no. 3, pp. 347–368, 2007). In this paper, we develop an empirical Bayes method for implementing the ODP based on a semiparametric hierarchical mixture model using the “smoothing-by-roughening" approach. Under the semiparametric hierarchical mixture model, (i) the prior distribution can be modeled flexibly, (ii) the ODP test statistic and the posterior distribution are analytically tractable, and (iii) computations are easy to implement. In addition, we provide a significance rule based on the false discovery rate (FDR) in the empirical Bayes framework. Applications to two clinical studies are presented.

Download Full-text

An approach to gene-based testing accounting for dependence of tests among nearby genes

10.1101/2021.05.24.445494 ◽

2021 ◽

Author(s):

Ronald J Yurko ◽

Kathryn Roeder ◽

Bernie Devlin ◽

Max G'Sell

Keyword(s):

Multiple Testing ◽

Association Studies ◽

Autism Spectrum ◽

P Value ◽

Genome Wide Association Studies ◽

Strongly Correlated ◽

Test Statistics ◽

Test Statistic ◽

Genome Wide ◽

Insight Into

In genome-wide association studies (GWAS), it has become commonplace to test millions of SNPs for phenotypic association. Gene-based testing can improve power to detect weak signal by reducing multiple testing and pooling signal strength. While such tests account for linkage disequilibrium (LD) structure of SNP alleles within each gene, current approaches do not capture LD of SNPs falling in different nearby genes, which can induce correlation of gene-based test statistics. We introduce an algorithm to account for this correlation. When a gene's test statistic is independent of others, it is assessed separately; when test statistics for nearby genes are strongly correlated, their SNPs are agglomerated and tested as a locus. To provide insight into SNPs and genes driving association within loci, we develop an interactive visualization tool to explore localized signal. We demonstrate our approach in the context of weakly powered GWAS for autism spectrum disorder, which is contrasted to more highly powered GWAS for schizophrenia and educational attainment. To increase power for these analyses, especially those for autism, we use adaptive p-value thresholding (AdaPT), guided by high-dimensional metadata modeled with gradient boosted trees, highlighting when and how it can be most useful. Notably our workflow is based on summary statistics.

Download Full-text

Assessing genome-wide significance for the detection of differentially methylated regions

Statistical Applications in Genetics and Molecular Biology ◽

10.1515/sagmb-2017-0050 ◽

2018 ◽

Vol 17 (5) ◽

Cited By ~ 3

Author(s):

Christian M. Page ◽

Linda Vos ◽

Trine B. Rounge ◽

Hanne F. Harbo ◽

Bettina K. Andreassen

Keyword(s):

Multiple Testing ◽

Alternative Methods ◽

Scan Statistic ◽

Differentially Methylated Regions ◽

False Discovery ◽

Genome Wide ◽

Higher Power ◽

Health And Disease ◽

Rate Controlled ◽

Genome Wide Significance

AbstractDNA methylation plays an important role in human health and disease, and methods for the identification of differently methylated regions are of increasing interest. There is currently a lack of statistical methods which properly address multiple testing, i.e. control genome-wide significance for differentially methylated regions. We introduce a scan statistic (DMRScan), which overcomes these limitations. We benchmark DMRScan against two well established methods (bumphunter, DMRcate), using a simulation study based on real methylation data. An implementation of DMRScan is available from Bioconductor. Our method has higher power than alternative methods across different simulation scenarios, particularly for small effect sizes. DMRScan exhibits greater flexibility in statistical modeling and can be used with more complex designs than current methods. DMRScan is the first dynamic approach which properly addresses the multiple-testing challenges for the identification of differently methylated regions. DMRScan outperformed alternative methods in terms of power, while keeping the false discovery rate controlled.

Download Full-text

4497 Accessible False Discovery Rate Computation

Journal of Clinical and Translational Science ◽

10.1017/cts.2020.164 ◽

2020 ◽

Vol 4 (s1) ◽

pp. 44-44

Author(s):

Megan C Hollister ◽

Jeffrey D. Blume

Keyword(s):

Multiple Testing ◽

Empirical Bayes ◽

Hypothesis Test ◽

Estimation Methods ◽

P Value ◽

Empirical Distributions ◽

False Discovery ◽

Research Findings ◽

Unknown Mixture ◽

User Friendly

OBJECTIVES/GOALS: To improve the implementation of FDRs in translation research. Current statistical packages are hard to use and fail to adequately convey strong assumptions. We developed a software package that allows the user to decide on assumptions and choose the hey desire. We encourage wider reporting of FDRs for observed findings. METHODS/STUDY POPULATION: We developed a user-friendly R function for computing FDRs from observed p-values. A variety of methods for FDR estimation and for FDR control are included so the user can select the approach most appropriate for their setting. Options include Efron’s Empirical Bayes FDR, Benjamini-Hochberg FDR control for multiple testing, Lindsey’s method for smoothing empirical distributions, estimation of the mixing proportion, and central matching. We illustrate the important difference between estimating the FDR for a particular finding and adjusting a hypothesis test to control the false discovery propensity. RESULTS/ANTICIPATED RESULTS: We performed a comparison of the capabilities of our new p.fdr function to the popular p.adjust function from the base stats-package. Specifically, we examined multiple examples of data coming from different unknown mixture distributions to highlight the null estimation methods p.fdr includes. The base package does not provide the optimal FDR usage nor sufficient estimation options. We also compared the step-up/step-down procedure used in adjusted p-value hypothesis test and discuss when this is inappropriate. The p.adjust function is not able to report raw-adjusted values and this will be shown in the graphical results. DISCUSSION/SIGNIFICANCE OF IMPACT: FDRs reveal the propensity for an observed result to be incorrect. FDRs should accompany observed results to help contextualize the relevance and potential impact of research findings. Our results show that previous methods are not sufficient rich or precise in their calculations. Our new package allows the user to be in control of the null estimation and step-up implementation when reporting FDRs.

Download Full-text

Genetic polymorphisms associated with sleep-related phenotypes; relationships with individual nocturnal symptoms of insomnia in the HUNT study

BMC Medical Genetics ◽

10.1186/s12881-019-0916-6 ◽

2019 ◽

Vol 20 (1) ◽

Cited By ~ 1

Author(s):

Daniela Bragantini ◽

Børge Sivertsen ◽

Philip Gehrman ◽

Stian Lydersen ◽

Ismail Cüneyt Güzey

Keyword(s):

Multiple Testing ◽

Association Studies ◽

Sleep Onset ◽

Early Morning ◽

Genome Wide Association Studies ◽

False Discovery ◽

Genome Wide ◽

Hunt Study ◽

Logistic Regressions ◽

Insomnia Symptoms

Abstract Background In recent years, several GWAS (genome wide association studies) of sleep-related traits have identified a number of SNPs (single nucleotides polymorphism) but their relationships with symptoms of insomnia are not known. The aim of this study was to investigate whether SNPs, previously reported in association with sleep-related phenotypes, are associated with individual symptoms of insomnia. Methods We selected participants from the HUNT study (Norway) who reported at least one symptom of insomnia consisting of sleep onset, maintenance or early morning awakening difficulties, (cases, N = 2563) compared to participants who presented no symptoms at all (controls, N = 3665). Cases were further divided in seven subgroups according to different combinations of these three symptoms. We used multinomial logistic regressions to test the association among different patterns of symptoms and 59 SNPs identified in past GWAS studies. Results Although 16 SNPS were significantly associated (p < 0.05) with at least one symptom subgroup, none of the investigated SNPs remained significant after correction for multiple testing using the false discovery rate (FDR) method. Conclusions SNPs associated with sleep-related traits do not replicate on any pattern of insomnia symptoms after multiple tests correction. However, correction in this case may be overly conservative.

Download Full-text

Empirical Bayesian approach to testing multiple hypotheses with separate priors for left and right alternatives

Statistical Applications in Genetics and Molecular Biology ◽

10.1515/sagmb-2018-0002 ◽

2018 ◽

Vol 17 (4) ◽

Author(s):

Naveen K. Bansal ◽

Mehdi Maadooliat ◽

Steven J. Schrodi

Keyword(s):

Empirical Bayes ◽

Bayes Rule ◽

Test Statistic ◽

Data Set ◽

Web Based ◽

Multiple Hypotheses ◽

Snp Data ◽

Genome Wide ◽

Shiny App ◽

User Friendly

Abstract We consider a multiple hypotheses problem with directional alternatives in a decision theoretic framework. We obtain an empirical Bayes rule subject to a constraint on mixed directional false discovery rate (mdFDR≤α) under the semiparametric setting where the distribution of the test statistic is parametric, but the prior distribution is nonparametric. We proposed separate priors for the left tail and right tail alternatives as it may be required for many applications. The proposed Bayes rule is compared through simulation against rules proposed by Benjamini and Yekutieli and Efron. We illustrate the proposed methodology for two sets of data from biological experiments: HIV-transfected cell-line mRNA expression data, and a quantitative trait genome-wide SNP data set. We have developed a user-friendly web-based shiny App for the proposed method which is available through URL https://npseb.shinyapps.io/npseb/. The HIV and SNP data can be directly accessed, and the results presented in this paper can be executed.

Download Full-text

A New Test Statistic Based on Shrunken Sample Variance for Identifying Differentially Expressed Genes in Small Microarray Experiments

Bioinformatics and Biology Insights ◽

10.4137/bbi.s473 ◽

2008 ◽

Vol 2 ◽

pp. BBI.S473 ◽

Cited By ~ 4

Author(s):

Akihiro Hirakawa ◽

Yasunori Sato ◽

Chikuma Hamada ◽

Isao Yoshimura

Keyword(s):

Differentially Expressed Genes ◽

Differentially Expressed ◽

Simulation Studies ◽

Test Statistic ◽

Microarray Experiments ◽

Cancer Data ◽

False Discovery ◽

The Mean ◽

Sample Variances ◽

Better Than

Choosing an appropriate statistic and precisely evaluating the false discovery rate (FDR) are both essential for devising an effective method for identifying differentially expressed genes in microarray data. The t-type score proposed by Pan et al. (2003) succeeded in suppressing false positives by controlling the underestimation of variance but left the overestimation uncontrolled. For controlling the overestimation, we devised a new test statistic (variance stabilized t-type score) by placing shrunken sample variances of the James-Stein type in the denominator of the t-type score. Since the relative superiority of the mean and median FDRs was unclear in the widely adopted Significance Analysis of Microarrays (SAM), we conducted simulation studies to examine the performance of the variance stabilized t-type score and the characteristics of the two FDRs. The variance stabilized t-type score was generally better than or at least as good as the t-type score, irrespective of the sample size and proportion of differentially expressed genes. In terms of accuracy, the median FDR was superior to the mean FDR when the proportion of differentially expressed genes was large. The variance stabilized t-type score with the median FDR was applied to actual colorectal cancer data and yielded a reasonable result.

Download Full-text

Resampling-Based Empirical Bayes Multiple Testing Procedures for Controlling Generalized Tail Probability and Expected Value Error Rates: Focus on the False Discovery Rate and Simulation Study

Biometrical Journal ◽

10.1002/bimj.200710473 ◽

2008 ◽

Vol 50 (5) ◽

pp. 716-744 ◽

Cited By ~ 12

Author(s):

Sandrine Dudoit ◽

Houston N. Gilbert ◽

Mark J. van der Laan

Keyword(s):

False Discovery Rate ◽

Simulation Study ◽

Multiple Testing ◽

Empirical Bayes ◽

Tail Probability ◽

Error Rates ◽

Expected Value ◽

Testing Procedures ◽

False Discovery ◽

Multiple Testing Procedures

Download Full-text

Mixture model-based association analysis with case-control data in genome wide association studies

Statistical Applications in Genetics and Molecular Biology ◽

10.1515/sagmb-2016-0022 ◽

2017 ◽

Vol 16 (3) ◽

Author(s):

Fadhaa Ali ◽

Jian Zhang

Keyword(s):

Mixture Model ◽

Multiple Testing ◽

Hypothesis Test ◽

Association Studies ◽

Real Data ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Model Based ◽

Genome Wide ◽

The Individual

AbstractMultilocus haplotype analysis of candidate variants with genome wide association studies (GWAS) data may provide evidence of association with disease, even when the individual loci themselves do not. Unfortunately, when a large number of candidate variants are investigated, identifying risk haplotypes can be very difficult. To meet the challenge, a number of approaches have been put forward in recent years. However, most of them are not directly linked to the disease-penetrances of haplotypes and thus may not be efficient. To fill this gap, we propose a mixture model-based approach for detecting risk haplotypes. Under the mixture model, haplotypes are clustered directly according to their estimated disease penetrances. A theoretical justification of the above model is provided. Furthermore, we introduce a hypothesis test for haplotype inheritance patterns which underpin this model. The performance of the proposed approach is evaluated by simulations and real data analysis. The results show that the proposed approach outperforms an existing multiple testing method.

Download Full-text

FIQT: a simple, powerful method to accurately estimate effect sizes in genome scans

10.1101/019299 ◽

2015 ◽

Cited By ~ 1

Author(s):

Tim B Bigdeli ◽

Donghyung Lee ◽

Brien P Riley ◽

Vladimir I Vladimirov ◽

Ayman H Fanous ◽

...

Keyword(s):

Multiple Testing ◽

Empirical Bayes ◽

Association Studies ◽

Genome Wide Association Studies ◽

Genome Scans ◽

P Values ◽

Psychiatric Genetic ◽

Genome Wide ◽

A Genome ◽

Z Scores

Genome scans, including both genome-wide association studies and deep sequencing, continue to discover a growing number of significant association signals for various traits. However, often variants meeting genome-wide significance criteria explain far less of the overall trait variance than “sub-threshold” association signals. To extract these sub-threshold signals, there is a need for methods which accurately estimate the mean of all (normally-distributed) test-statistics from a genome scan (i.e., Z-scores). This is currently achieved by the difficult procedures of adjusting all Z-score (χ_1^2) statistics for “winner’s curse” (multiple testing). Given that multiple testing adjustments are much simpler for p-values, we propose a method for estimating Z-scores means by i) first adjusting their p-values for multiple testing and then ii) transforming the adjusted p-values to upper tail Z-scores with the sign of the original statistics. Because a False Discovery Rate (FDR) procedure is used for multiple testing adjustment, we denote this method FDR Inverse Quantile Transformation (FIQT). When compared to competitors, e.g. Empirical Bayes (including proposed improvements), FIQT is more i) accurate and ii) computationally efficient by orders of magnitude. Its accuracy advantage is substantial at larger sample sizes and/or moderate numbers of association signals. Practical application of FIQT to Z-scores from the first Psychiatric Genetic Consortium (PGC) schizophrenia predicts a non-trivial fraction of the significant signal regions from the subsequent published PGC schizophrenia studies. Finally, we suggest that FIQT might be i) used to improve subject level risk prediction and ii) further improved by modelling the noncentrality of χ_1^2 statistics.

Download Full-text

Genome-wide scale analyses identify novel BMI genotype-environment interactions using a conditional false discovery rate

10.1101/2020.01.22.908038 ◽

2020 ◽

Author(s):

R. Moore ◽

L. Georgatou-Politou ◽

J. Liley ◽

O. Stegle ◽

I. Barroso

Keyword(s):

False Discovery Rate ◽

Multiple Testing ◽

Environment Interaction ◽

Test Results ◽

Genotype Environment Interaction ◽

False Discovery ◽

Genome Wide ◽

A Genome ◽

Wide Scale ◽

Candidate Loci

AbstractGenotype-environment interaction (G×E) studies typically focus on variants with previously known marginal associations. While such two-step filtering greatly reduces the multiple testing burden, it can miss loci with pronounced G×E effects, which tend to have weaker marginal associations. To test for G×E effects on a genome-wide scale whilst leveraging information from marginal associations in a flexible manner, we combine the conditional false discovery rate with interaction test results obtained from StructLMM. After validating our approach, we applied this strategy to UK Biobank (UKBB) data to probe for G×E effects on BMI. Using 126,077 UKBB individuals for discovery, we identified known (FTO, MC4R, SEC16B) and novel G×E signals, many of which replicated (FAM150B/ALKAL2,TMEM18, EFR3B, ZNF596-FAM87A, LIN7C-BDNF, FAIM2, UNC79, LAT) in an independent subset of UKBB (n=126,076). Finally, when analysing the full UKBB cohort, we identified 140 candidate loci with G×E effects, highlighting the advantages of our approach.

Download Full-text