The paltry power of priors versus populations

Mapping Intimacies ◽

10.1101/737676 ◽

2019 ◽

Author(s):

Jianan Zhan ◽

Dan E. Arking ◽

Joel S. Bader

Keyword(s):

Population Size ◽

Multiple Testing ◽

Association Studies ◽

Genome Wide Association ◽

P Value ◽

Genome Wide Association Studies ◽

Significant Finding ◽

Rna Seq ◽

Test Power ◽

Genome Wide

AbstractBiological experiments often involve hypothesis testing at the scale of thousands to millions of tests. Alleviating the multiple testing burden has been a goal of many methods designed to boost test power by focusing tests on the alternative hypotheses most likely to be true. Very often, these methods either explicitly or implicitly make use of prior probabilities that bias significance for favored sets thought to be enriched for significant finding. Nevertheless, most genomics experiments, and in particular genome-wide association studies (GWAS), still use traditional univariate tests rather than more sophisticated approaches. Here we use GWAS to demonstrate why unbiased tests remain in favor. We calculate test power assuming perfect knowledge of a prior distribution and then derive the population size increase required to provided the same boost without a prior. We show that population size is exponentially more important than prior, providing a rigorous explanation for the observed avoidance of prior-based methods.Author summaryBiological experiments often test thousands to millions of hypotheses. Gene-based tests for human RNA-Seq data, for example, involve approximately 20,000; genome-wide association studies (GWAS) involve about 1 million effective tests. The conventional approach is to perform individual tests and then apply a Bonferroni correction to account for multiple testing. This approach implies a single-test p-value of 2.5 × 10−6 for RNA-Seq experiments, and a p-value of 5 × 10−8 for GWAS, to control the false-positive rate at a conventional value of 0.05. Many methods have been proposed to alleviate the multiple-testing burden by incorporating a prior probability that boosts the significance for a subset of candidate genes or variants. At the extreme limit, only the candidate set is tested, corresponding to a decreased multiple testing burden. Despite decades of methods development, prior-based tests have not been generally used. Here we compare the power increase possible with a prior with the increase possible with a much simpler strategy of increasing a study size. We show that increasing the population size is exponentially more valuable than increasing the strength of prior, even when the true prior is known exactly. These results provide a rigorous explanation for the continued use of simple, robust methods rather than more sophisticated approaches.

Download Full-text

Reproducibility in the UK Biobank of Genome-Wide Significant Signals Discovered in Earlier Genome-wide Association Studies

10.1101/2020.06.24.20139576 ◽

2020 ◽

Cited By ~ 1

Author(s):

Jack W. O’Sullivan ◽

John P. A. Ioannidis

Keyword(s):

Effect Size ◽

Association Studies ◽

Genome Wide Association ◽

P Value ◽

Genome Wide Association Studies ◽

Uk Biobank ◽

Single Nucleotide ◽

Genome Wide ◽

The Uk ◽

Open Question

AbstractWith the establishment of large biobanks, discovery of single nucleotide polymorphism (SNPs) that are associated with various phenotypes has been accelerated. An open question is whether SNPs identified with genome-wide significance in earlier genome-wide association studies (GWAS) are replicated also in later GWAS conducted in biobanks. To address this question, the authors examined a publicly available GWAS database and identified two, independent GWAS on the same phenotype (an earlier, “discovery” GWAS and a later, replication GWAS done in the UK biobank). The analysis evaluated 136,318,924 SNPs (of which 6,289 had reached p<5e-8 in the discovery GWAS) from 4,397,962 participants across nine phenotypes. The overall replication rate was 85.0% and it was lower for binary than for quantitative phenotypes (58.1% versus 94.8% respectively). There was a18.0% decrease in SNP effect size for binary phenotypes, but a 12.0% increase for quantitative phenotypes. Using the discovery SNP effect size, phenotype trait (binary or quantitative), and discovery p-value, we built and validated a model that predicted SNP replication with area under the Receiver Operator Curve = 0.90. While non-replication may often reflect lack of power rather than genuine false-positive findings, these results provide insights about which discovered associations are likely to be seen again across subsequent GWAS.

Download Full-text

An approach to gene-based testing accounting for dependence of tests among nearby genes

10.1101/2021.05.24.445494 ◽

2021 ◽

Author(s):

Ronald J Yurko ◽

Kathryn Roeder ◽

Bernie Devlin ◽

Max G'Sell

Keyword(s):

Multiple Testing ◽

Association Studies ◽

Autism Spectrum ◽

P Value ◽

Genome Wide Association Studies ◽

Strongly Correlated ◽

Test Statistics ◽

Test Statistic ◽

Genome Wide ◽

Insight Into

In genome-wide association studies (GWAS), it has become commonplace to test millions of SNPs for phenotypic association. Gene-based testing can improve power to detect weak signal by reducing multiple testing and pooling signal strength. While such tests account for linkage disequilibrium (LD) structure of SNP alleles within each gene, current approaches do not capture LD of SNPs falling in different nearby genes, which can induce correlation of gene-based test statistics. We introduce an algorithm to account for this correlation. When a gene's test statistic is independent of others, it is assessed separately; when test statistics for nearby genes are strongly correlated, their SNPs are agglomerated and tested as a locus. To provide insight into SNPs and genes driving association within loci, we develop an interactive visualization tool to explore localized signal. We demonstrate our approach in the context of weakly powered GWAS for autism spectrum disorder, which is contrasted to more highly powered GWAS for schizophrenia and educational attainment. To increase power for these analyses, especially those for autism, we use adaptive p-value thresholding (AdaPT), guided by high-dimensional metadata modeled with gradient boosted trees, highlighting when and how it can be most useful. Notably our workflow is based on summary statistics.

Download Full-text

The harmonic mean p-value for combining dependent tests

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1814092116 ◽

2019 ◽

Vol 116 (4) ◽

pp. 1195-1200 ◽

Cited By ~ 43

Author(s):

Daniel J. Wilson

Keyword(s):

Multiple Testing ◽

Statistical Power ◽

Scientific Discovery ◽

Association Studies ◽

Harmonic Mean ◽

P Value ◽

Genome Wide Association Studies ◽

Familywise Error Rate ◽

Significance Threshold ◽

Genome Wide

Analysis of “big data” frequently involves statistical comparison of millions of competing hypotheses to discover hidden processes underlying observed patterns of data, for example, in the search for genetic determinants of disease in genome-wide association studies (GWAS). Controlling the familywise error rate (FWER) is considered the strongest protection against false positives but makes it difficult to reach the multiple testing-corrected significance threshold. Here, I introduce the harmonic mean p-value (HMP), which controls the FWER while greatly improving statistical power by combining dependent tests using generalized central limit theorem. I show that the HMP effortlessly combines information to detect statistically significant signals among groups of individually nonsignificant hypotheses in examples of a human GWAS for neuroticism and a joint human–pathogen GWAS for hepatitis C viral load. The HMP simultaneously tests all ways to group hypotheses, allowing the smallest groups of hypotheses that retain significance to be sought. The power of the HMP to detect significant hypothesis groups is greater than the power of the Benjamini–Hochberg procedure to detect significant hypotheses, although the latter only controls the weaker false discovery rate (FDR). The HMP has broad implications for the analysis of large datasets, because it enhances the potential for scientific discovery.

Download Full-text

Genome-Wide Association Study Identifies Loci Associated with Sensitive Skin

Cosmetics ◽

10.3390/cosmetics7020049 ◽

2020 ◽

Vol 7 (2) ◽

pp. 49

Author(s):

Miranda A. Farage ◽

Yunxuan Jiang ◽

Jay P. Tiesman ◽

Pierre Fontanillas ◽

Rosemarie Osborne

Keyword(s):

Genome Wide Association Study ◽

Association Studies ◽

Genome Wide Association ◽

European Ancestry ◽

P Value ◽

Genome Wide Association Studies ◽

Online Questionnaire ◽

Sensitive Skin ◽

Genome Wide ◽

Skin Conditions

Individuals suffering from sensitive skin often have other skin conditions and/or diseases, such as fair skin, freckles, rosacea, or atopic dermatitis. Genome-wide association studies (GWAS) have been performed for some of these conditions, but not for sensitive skin. In this study, a total of 23,426 unrelated participants of European ancestry from the 23andMe database were evaluated for self-declared sensitive skin, other skin conditions, and diseases using an online questionnaire format. Responders were separated into two groups: those who declared they had sensitive skin (n = 8971) and those who declared their skin was not sensitive (controls, n = 14,455). A GWAS of sensitive skin individuals identified three genome-wide significance loci (p-value < 5 × 10−8) and seven suggestive loci (p-value < 1 × 10−6). Of the three most significant loci, all have been associated with pigmentation and two have been associated with acne.

Download Full-text

Multiple testing in genome-wide association studies via hidden Markov models

Bioinformatics ◽

10.1093/bioinformatics/btp476 ◽

2009 ◽

Vol 25 (21) ◽

pp. 2802-2808 ◽

Cited By ~ 31

Author(s):

Zhi Wei ◽

Wenguang Sun ◽

Kai Wang ◽

Hakon Hakonarson

Keyword(s):

Hidden Markov Models ◽

Multiple Testing ◽

Markov Models ◽

Hidden Markov ◽

Association Studies ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Genome Wide

Download Full-text

Mixture model-based association analysis with case-control data in genome wide association studies

Statistical Applications in Genetics and Molecular Biology ◽

10.1515/sagmb-2016-0022 ◽

2017 ◽

Vol 16 (3) ◽

Author(s):

Fadhaa Ali ◽

Jian Zhang

Keyword(s):

Mixture Model ◽

Multiple Testing ◽

Hypothesis Test ◽

Association Studies ◽

Real Data ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Model Based ◽

Genome Wide ◽

The Individual

AbstractMultilocus haplotype analysis of candidate variants with genome wide association studies (GWAS) data may provide evidence of association with disease, even when the individual loci themselves do not. Unfortunately, when a large number of candidate variants are investigated, identifying risk haplotypes can be very difficult. To meet the challenge, a number of approaches have been put forward in recent years. However, most of them are not directly linked to the disease-penetrances of haplotypes and thus may not be efficient. To fill this gap, we propose a mixture model-based approach for detecting risk haplotypes. Under the mixture model, haplotypes are clustered directly according to their estimated disease penetrances. A theoretical justification of the above model is provided. Furthermore, we introduce a hypothesis test for haplotype inheritance patterns which underpin this model. The performance of the proposed approach is evaluated by simulations and real data analysis. The results show that the proposed approach outperforms an existing multiple testing method.

Download Full-text

A meta-analysis of the genome-wide association studies on two genetically correlated phenotypes (self-reported headache and self-reported migraine) identifies four new risk loci for headaches (N=397,385)

10.1101/2021.09.15.21263668 ◽

2021 ◽

Author(s):

Weihua Meng ◽

Parminder Reel ◽

Charvi Nangia ◽

Aravind Rajendrakumar ◽

Harry Hebert ◽

...

Keyword(s):

Association Studies ◽

Meta Analysis ◽

The Self ◽

Genome Wide Association ◽

P Value ◽

Clinical Settings ◽

Genome Wide Association Studies ◽

Genome Wide ◽

Genetic Mechanisms ◽

The Uk

Headache is one of the commonest complaints that doctors need to address in clinical settings. The genetic mechanisms of different types of headache are not well understood. In this study, we performed a meta-analysis of genome-wide association studies (GWAS) on the self-reported headache phenotype from the UK Biobank cohort and the self-reported migraine phenotype from the 23andMe resource using the metaUSAT for genetically correlated phenotypes (N=397,385). We identified 38 loci for headaches, of which 34 loci have been reported before and 4 loci were newly identified. The LRP1-STAT6-SDR9C7 region in chromosome 12 was the most significantly associated locus with a leading P value of 1.24 x 10-62 of rs11172113. The ONECUT2 gene locus in chromosome 18 was the strongest signal among the 4 new loci with a P value of 1.29 x 10-9 of rs673939. Our study demonstrated that the genetically correlated phenotypes of self-reported headache and self-reported migraine can be meta-analysed together in theory and in practice to boost study power to identify more new variants for headaches. This study has paved way for a large GWAS meta-analysis study involving cohorts of different, though genetically correlated headache phenotypes.

Download Full-text

FORGE: multivariate calculation of gene-wide p-values from Genome-Wide Association Studies Authors and Affiliations

10.1101/023648 ◽

2015 ◽

Cited By ~ 2

Author(s):

Inti Inal Pedroso ◽

Michael R Barnes ◽

Anbarasu Lourdusamy ◽

Ammar Al-Chalabi ◽

Gerome Breen

Keyword(s):

Statistical Power ◽

Association Studies ◽

Single Point ◽

Genome Wide Association ◽

P Value ◽

Disease Genes ◽

Snp Analysis ◽

Genome Wide Association Studies ◽

P Values ◽

Genome Wide

Genome-wide association studies (GWAS) have proven a valuable tool to explore the genetic basis of many traits. However, many GWAS lack statistical power and the commonly used single-point analysis method needs to be complemented to enhance power and interpretation. Multivariate region or gene-wide association are an alternative, allowing for identification of disease genes in a manner more robust to allelic heterogeneity. Gene-based association also facilitates systems biology analyses by generating a single p-value per gene. We have designed and implemented FORGE, a software suite which implements a range of methods for the combination of p-values for the individual genetic variants within a gene or genomic region. The software can be used with summary statistics (marker ids and p-values) and accepts as input the result file formats of commonly used genetic association software. When applied to a study of Crohn's disease susceptibility, it identified all genes found by single SNP analysis and additional genes identified by large independent meta-analysis. FORGE p-values on gene-set analyses highlighted association with the Jak-STAT and cytokine signalling pathways, both previously associated with CD. We highlight the software's main features, its future development directions and provide a comparison with alternative available software tools. FORGE can be freely accessed at https://github.com/inti/FORGE.

Download Full-text

Αποτίμηση γενετικών παραγόντων κινδύνου για διαταραχές των οστών

10.12681/eadd/49487 ◽

2020 ◽

Author(s):

Μαρία Χρήστου

Keyword(s):

Bone Mineral Density ◽

Bone Mineral ◽

Association Studies ◽

Genome Wide Association ◽

P Value ◽

Genome Wide Association Studies ◽

Mineral Density ◽

Genome Wide

Σκοπός της διδακτορικής διατριβής ήταν η αποσαφήνιση της αιτιολογίας της οστεοπόρωσης, της συχνότερης διαταραχής των οστών, μέσω μελέτης της γενετικής πλειοτροπίας της νόσου.Στην προσπάθεια αποτίμησης των πλειοτροπικών γενετικών πολυμορφισμών που σχετίζονται με ελαττωμένη οστική πυκνότητα (Bone Mineral Density, BMD) και διάφορους φαινοτύπους πραγματοποιήθηκε ανασκόπηση της βιβλιογραφίας. Περιγράφηκαν τα πρόσφατα δεδομένα γενετικής πλειοτροπίας μεταξύ της BMD και της οστεοπόρωσης και 5 φαινοτύπων εντός του μυοσκελετικού συστήματος (κάταγμα, οστεοαρθρίτιδα, ρευματοειδής αρθρίτιδα, άλιπη σωματική μάζα, εκφύλιση μεσοσπονδύλιου δίσκου στην οσφυϊκή μοίρα της σπονδυλικής στήλης), καθώς επίσης των ακόλουθων 15 φαινοτύπων εκτός του μυοσκελετικού συστήματος: δείκτης μάζας σώματος, περιφέρεια μέσης, λόγος περιφέρειας μέσης-ισχίου, λιπώδης μάζα, %λίπος, στεφανιαία νόσος, σακχαρώδης διαβήτης τύπου 2, λιπίδια πλάσματος, ύψος, ηλικία εμμηναρχής, ηλικία κατά την εφηβεία γενικότερα, κατανάλωση αλκοόλ, ανδρογενής αλωπεκία, καρκίνος μαστού και εκφύλιση ωχράς κηλίδας σχετιζόμενη με την ηλικία.Στην προσπάθεια περαιτέρω διερεύνησης των ανωτέρω ενδείξεων παρουσίας πλειοτροπίας μεταξύ μυοσκελετικών και μη φαινοτύπων και της ελαττωμένης BMD, καθώς επίσης περαιτέρω κατανόησης της γενετικής αρχιτεκτονικής της BMD, αναζητήθηκαν πλειοτροπικές συσχετίσεις με φαινομενικά μη σχετιζόμενους, μη οστικούς φαινοτύπους. Ειδικότερα, στη φάση ανακάλυψης της οστικής πλειοτροπίας, πραγματοποιώντας ευρεία σάρωση του γονιδιώματος (genome-wide pleiotropy scan), ελέγχθηκαν πολυμορφισμοί του NHGRI-EBI Καταλόγου που σχετίζονται με διάφορους μη οστικούς φαινοτύπους σε προηγούμενες μελέτες ευρείας σάρωσης του γονιδιώματος (Genome Wide Association Studies, GWAS) για συσχέτιση με την BMD (αυχένα κεφαλής μηριαίου οστού, οσφυϊκή μοίρα σπονδυλικής στήλης) σε περισσότερα από 80.000 άτομα του μεγάλου διεθνούς συνασπισμού GEFOS. Στη συνέχεια, στη φάση επικύρωσης της οστικής πλειοτροπίας, οι 72 ισχυρότεροι πολυμορφισμοί από τη φάση ανακάλυψης ελέγχθηκαν για επικύρωση σε περισσότερα από 400.000 άτομα από τη μεγάλη μελέτη UK Biobank.Με αυτόν τον τρόπο εντοπίστηκαν 12 πλειοτροπικοί πολυμορφισμοί που σχετίζονται με την BMD (πτέρνας) και τους ακόλουθους 14 μη οστικούς φαινοτύπους σε επίπεδο p-value<5x10-8: ύψος, περιφέρεια μέσης, νόσος Parkinson, καρκίνος στομάχου μη καρδιακού τύπου, ανδρογενής αλωπεκία, αλλεργικές ασθένειες (άσθμα, πυρετός εκ χόρτου, έκζεμα), ατοπική δερματίτιδα, ατοπία γενικότερα, μαγνήσιο ορού, ηλεκτρολύτες ούρων, πρωτεΐνες ορού, δικτυοερυθροκύτταρα, κατανάλωση καφέ και εκπαιδευτικό επίπεδο. Οι 12 πλειοτροπικοί πολυμορφισμοί βρίσκονταν σε 11 γενετικούς τόπους, σε 8 χρωμοσώματα. Εννέα πολυμορφισμοί περιλαμβάνονταν στον NHGRI-EBI Κατάλογο, ενώ 3 πολυμορφισμοί ήταν γειτονικοί. Συμπερασματικά, η διδακτορική διατριβή ανέδειξε την παρουσία διάφορων φαινοτυπικών συσχετίσεων για την οστεοπόρωση μέσω μελέτης της γενετικής πλειοτροπίας. Αξιοσημείωτα οφέλη από τη διερεύνηση της γενετικής πλειοτροπίας της οστεοπόρωσης αποτελούν οι κλινικές συνέπειες ενσωμάτωσης των μοριακών ανακαλύψεων (υπεύθυνα γονίδια και μονοπάτια) στην αιτιολογία της νόσου. Στόχος είναι οι μελλοντικές προσπάθειες να επικεντρωθούν στην ανάπτυξη νέων φαρμάκων με πλειοτροπικές δράσεις, στην επαναστόχευση των ήδη υπάρχοντων φαρμάκων, καθώς επίσης στην εκτίμηση του κινδύνου ανάπτυξης της νόσου σε άτομα υψηλού κινδύνου.

Download Full-text

networkGWAS: A network-based approach for genome-wide association studies in structured populations

10.1101/2021.11.11.468206 ◽

2021 ◽

Author(s):

Giulia Muzio ◽

Leslie O'Bray ◽

Laetitia Meng-Papaxanthos ◽

Juliane Klatt ◽

Karsten Borgwardt

Keyword(s):

Genetic Markers ◽

Complex Traits ◽

Multiple Testing ◽

Association Studies ◽

Search Space ◽

Structured Populations ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Multiple Testing Correction ◽

Genome Wide

While the search for associations between genetic markers and complex traits has discovered tens of thousands of trait-related genetic variants, the vast majority of these only explain a tiny fraction of observed phenotypic variation. One possible strategy to detect stronger associations is to aggregate the effects of several genetic markers and to test entire genes, pathways or (sub)networks of genes for association to a phenotype. The latter, network-based genome-wide association studies, in particular suffers from a huge search space and an inherent multiple testing problem. As a consequence, current approaches are either based on greedy feature selection, thereby risking that they miss relevant associations, and/or neglect doing a multiple testing correction, which can lead to an abundance of false positive findings. To address the shortcomings of current approaches of network-based genome-wide association studies, we propose <tt>networkGWAS</tt>, a computationally efficient and statistically sound approach to gene-based genome-wide association studies based on mixed models and neighborhood aggregation. It allows for population structure correction and for well-calibrated p-values, which we obtain through a block permutation scheme. <tt>networkGWAS</tt> successfully detects known or plausible associations on simulated rare variants from H. sapiens data as well as semi-simulated and real data with common variants from A. thaliana and enables the systematic combination of gene-based genome-wide association studies with biological network information.

Download Full-text