scholarly journals The paltry power of priors versus populations

2019 ◽  
Author(s):  
Jianan Zhan ◽  
Dan E. Arking ◽  
Joel S. Bader

AbstractBiological experiments often involve hypothesis testing at the scale of thousands to millions of tests. Alleviating the multiple testing burden has been a goal of many methods designed to boost test power by focusing tests on the alternative hypotheses most likely to be true. Very often, these methods either explicitly or implicitly make use of prior probabilities that bias significance for favored sets thought to be enriched for significant finding. Nevertheless, most genomics experiments, and in particular genome-wide association studies (GWAS), still use traditional univariate tests rather than more sophisticated approaches. Here we use GWAS to demonstrate why unbiased tests remain in favor. We calculate test power assuming perfect knowledge of a prior distribution and then derive the population size increase required to provided the same boost without a prior. We show that population size is exponentially more important than prior, providing a rigorous explanation for the observed avoidance of prior-based methods.Author summaryBiological experiments often test thousands to millions of hypotheses. Gene-based tests for human RNA-Seq data, for example, involve approximately 20,000; genome-wide association studies (GWAS) involve about 1 million effective tests. The conventional approach is to perform individual tests and then apply a Bonferroni correction to account for multiple testing. This approach implies a single-test p-value of 2.5 × 10−6 for RNA-Seq experiments, and a p-value of 5 × 10−8 for GWAS, to control the false-positive rate at a conventional value of 0.05. Many methods have been proposed to alleviate the multiple-testing burden by incorporating a prior probability that boosts the significance for a subset of candidate genes or variants. At the extreme limit, only the candidate set is tested, corresponding to a decreased multiple testing burden. Despite decades of methods development, prior-based tests have not been generally used. Here we compare the power increase possible with a prior with the increase possible with a much simpler strategy of increasing a study size. We show that increasing the population size is exponentially more valuable than increasing the strength of prior, even when the true prior is known exactly. These results provide a rigorous explanation for the continued use of simple, robust methods rather than more sophisticated approaches.

Author(s):  
Jack W. O’Sullivan ◽  
John P. A. Ioannidis

AbstractWith the establishment of large biobanks, discovery of single nucleotide polymorphism (SNPs) that are associated with various phenotypes has been accelerated. An open question is whether SNPs identified with genome-wide significance in earlier genome-wide association studies (GWAS) are replicated also in later GWAS conducted in biobanks. To address this question, the authors examined a publicly available GWAS database and identified two, independent GWAS on the same phenotype (an earlier, “discovery” GWAS and a later, replication GWAS done in the UK biobank). The analysis evaluated 136,318,924 SNPs (of which 6,289 had reached p<5e-8 in the discovery GWAS) from 4,397,962 participants across nine phenotypes. The overall replication rate was 85.0% and it was lower for binary than for quantitative phenotypes (58.1% versus 94.8% respectively). There was a18.0% decrease in SNP effect size for binary phenotypes, but a 12.0% increase for quantitative phenotypes. Using the discovery SNP effect size, phenotype trait (binary or quantitative), and discovery p-value, we built and validated a model that predicted SNP replication with area under the Receiver Operator Curve = 0.90. While non-replication may often reflect lack of power rather than genuine false-positive findings, these results provide insights about which discovered associations are likely to be seen again across subsequent GWAS.


2021 ◽  
Author(s):  
Ronald J Yurko ◽  
Kathryn Roeder ◽  
Bernie Devlin ◽  
Max G'Sell

In genome-wide association studies (GWAS), it has become commonplace to test millions of SNPs for phenotypic association. Gene-based testing can improve power to detect weak signal by reducing multiple testing and pooling signal strength. While such tests account for linkage disequilibrium (LD) structure of SNP alleles within each gene, current approaches do not capture LD of SNPs falling in different nearby genes, which can induce correlation of gene-based test statistics. We introduce an algorithm to account for this correlation. When a gene's test statistic is independent of others, it is assessed separately; when test statistics for nearby genes are strongly correlated, their SNPs are agglomerated and tested as a locus. To provide insight into SNPs and genes driving association within loci, we develop an interactive visualization tool to explore localized signal. We demonstrate our approach in the context of weakly powered GWAS for autism spectrum disorder, which is contrasted to more highly powered GWAS for schizophrenia and educational attainment. To increase power for these analyses, especially those for autism, we use adaptive p-value thresholding (AdaPT), guided by high-dimensional metadata modeled with gradient boosted trees, highlighting when and how it can be most useful. Notably our workflow is based on summary statistics.


2019 ◽  
Vol 116 (4) ◽  
pp. 1195-1200 ◽  
Author(s):  
Daniel J. Wilson

Analysis of “big data” frequently involves statistical comparison of millions of competing hypotheses to discover hidden processes underlying observed patterns of data, for example, in the search for genetic determinants of disease in genome-wide association studies (GWAS). Controlling the familywise error rate (FWER) is considered the strongest protection against false positives but makes it difficult to reach the multiple testing-corrected significance threshold. Here, I introduce the harmonic mean p-value (HMP), which controls the FWER while greatly improving statistical power by combining dependent tests using generalized central limit theorem. I show that the HMP effortlessly combines information to detect statistically significant signals among groups of individually nonsignificant hypotheses in examples of a human GWAS for neuroticism and a joint human–pathogen GWAS for hepatitis C viral load. The HMP simultaneously tests all ways to group hypotheses, allowing the smallest groups of hypotheses that retain significance to be sought. The power of the HMP to detect significant hypothesis groups is greater than the power of the Benjamini–Hochberg procedure to detect significant hypotheses, although the latter only controls the weaker false discovery rate (FDR). The HMP has broad implications for the analysis of large datasets, because it enhances the potential for scientific discovery.


Cosmetics ◽  
2020 ◽  
Vol 7 (2) ◽  
pp. 49
Author(s):  
Miranda A. Farage ◽  
Yunxuan Jiang ◽  
Jay P. Tiesman ◽  
Pierre Fontanillas ◽  
Rosemarie Osborne

Individuals suffering from sensitive skin often have other skin conditions and/or diseases, such as fair skin, freckles, rosacea, or atopic dermatitis. Genome-wide association studies (GWAS) have been performed for some of these conditions, but not for sensitive skin. In this study, a total of 23,426 unrelated participants of European ancestry from the 23andMe database were evaluated for self-declared sensitive skin, other skin conditions, and diseases using an online questionnaire format. Responders were separated into two groups: those who declared they had sensitive skin (n = 8971) and those who declared their skin was not sensitive (controls, n = 14,455). A GWAS of sensitive skin individuals identified three genome-wide significance loci (p-value < 5 × 10−8) and seven suggestive loci (p-value < 1 × 10−6). Of the three most significant loci, all have been associated with pigmentation and two have been associated with acne.


Author(s):  
Fadhaa Ali ◽  
Jian Zhang

AbstractMultilocus haplotype analysis of candidate variants with genome wide association studies (GWAS) data may provide evidence of association with disease, even when the individual loci themselves do not. Unfortunately, when a large number of candidate variants are investigated, identifying risk haplotypes can be very difficult. To meet the challenge, a number of approaches have been put forward in recent years. However, most of them are not directly linked to the disease-penetrances of haplotypes and thus may not be efficient. To fill this gap, we propose a mixture model-based approach for detecting risk haplotypes. Under the mixture model, haplotypes are clustered directly according to their estimated disease penetrances. A theoretical justification of the above model is provided. Furthermore, we introduce a hypothesis test for haplotype inheritance patterns which underpin this model. The performance of the proposed approach is evaluated by simulations and real data analysis. The results show that the proposed approach outperforms an existing multiple testing method.


2021 ◽  
Author(s):  
Weihua Meng ◽  
Parminder Reel ◽  
Charvi Nangia ◽  
Aravind Rajendrakumar ◽  
Harry Hebert ◽  
...  

Headache is one of the commonest complaints that doctors need to address in clinical settings. The genetic mechanisms of different types of headache are not well understood. In this study, we performed a meta-analysis of genome-wide association studies (GWAS) on the self-reported headache phenotype from the UK Biobank cohort and the self-reported migraine phenotype from the 23andMe resource using the metaUSAT for genetically correlated phenotypes (N=397,385). We identified 38 loci for headaches, of which 34 loci have been reported before and 4 loci were newly identified. The LRP1-STAT6-SDR9C7 region in chromosome 12 was the most significantly associated locus with a leading P value of 1.24 x 10-62 of rs11172113. The ONECUT2 gene locus in chromosome 18 was the strongest signal among the 4 new loci with a P value of 1.29 x 10-9 of rs673939. Our study demonstrated that the genetically correlated phenotypes of self-reported headache and self-reported migraine can be meta-analysed together in theory and in practice to boost study power to identify more new variants for headaches. This study has paved way for a large GWAS meta-analysis study involving cohorts of different, though genetically correlated headache phenotypes.


2015 ◽  
Author(s):  
Inti Inal Pedroso ◽  
Michael R Barnes ◽  
Anbarasu Lourdusamy ◽  
Ammar Al-Chalabi ◽  
Gerome Breen

Genome-wide association studies (GWAS) have proven a valuable tool to explore the genetic basis of many traits. However, many GWAS lack statistical power and the commonly used single-point analysis method needs to be complemented to enhance power and interpretation. Multivariate region or gene-wide association are an alternative, allowing for identification of disease genes in a manner more robust to allelic heterogeneity. Gene-based association also facilitates systems biology analyses by generating a single p-value per gene. We have designed and implemented FORGE, a software suite which implements a range of methods for the combination of p-values for the individual genetic variants within a gene or genomic region. The software can be used with summary statistics (marker ids and p-values) and accepts as input the result file formats of commonly used genetic association software. When applied to a study of Crohn's disease susceptibility, it identified all genes found by single SNP analysis and additional genes identified by large independent meta-analysis. FORGE p-values on gene-set analyses highlighted association with the Jak-STAT and cytokine signalling pathways, both previously associated with CD. We highlight the software's main features, its future development directions and provide a comparison with alternative available software tools. FORGE can be freely accessed at https://github.com/inti/FORGE.


2020 ◽  
Author(s):  
Μαρία Χρήστου

Σκοπός της διδακτορικής διατριβής ήταν η αποσαφήνιση της αιτιολογίας της οστεοπόρωσης, της συχνότερης διαταραχής των οστών, μέσω μελέτης της γενετικής πλειοτροπίας της νόσου.Στην προσπάθεια αποτίμησης των πλειοτροπικών γενετικών πολυμορφισμών που σχετίζονται με ελαττωμένη οστική πυκνότητα (Bone Mineral Density, BMD) και διάφορους φαινοτύπους πραγματοποιήθηκε ανασκόπηση της βιβλιογραφίας. Περιγράφηκαν τα πρόσφατα δεδομένα γενετικής πλειοτροπίας μεταξύ της BMD και της οστεοπόρωσης και 5 φαινοτύπων εντός του μυοσκελετικού συστήματος (κάταγμα, οστεοαρθρίτιδα, ρευματοειδής αρθρίτιδα, άλιπη σωματική μάζα, εκφύλιση μεσοσπονδύλιου δίσκου στην οσφυϊκή μοίρα της σπονδυλικής στήλης), καθώς επίσης των ακόλουθων 15 φαινοτύπων εκτός του μυοσκελετικού συστήματος: δείκτης μάζας σώματος, περιφέρεια μέσης, λόγος περιφέρειας μέσης-ισχίου, λιπώδης μάζα, %λίπος, στεφανιαία νόσος, σακχαρώδης διαβήτης τύπου 2, λιπίδια πλάσματος, ύψος, ηλικία εμμηναρχής, ηλικία κατά την εφηβεία γενικότερα, κατανάλωση αλκοόλ, ανδρογενής αλωπεκία, καρκίνος μαστού και εκφύλιση ωχράς κηλίδας σχετιζόμενη με την ηλικία.Στην προσπάθεια περαιτέρω διερεύνησης των ανωτέρω ενδείξεων παρουσίας πλειοτροπίας μεταξύ μυοσκελετικών και μη φαινοτύπων και της ελαττωμένης BMD, καθώς επίσης περαιτέρω κατανόησης της γενετικής αρχιτεκτονικής της BMD, αναζητήθηκαν πλειοτροπικές συσχετίσεις με φαινομενικά μη σχετιζόμενους, μη οστικούς φαινοτύπους. Ειδικότερα, στη φάση ανακάλυψης της οστικής πλειοτροπίας, πραγματοποιώντας ευρεία σάρωση του γονιδιώματος (genome-wide pleiotropy scan), ελέγχθηκαν πολυμορφισμοί του NHGRI-EBI Καταλόγου που σχετίζονται με διάφορους μη οστικούς φαινοτύπους σε προηγούμενες μελέτες ευρείας σάρωσης του γονιδιώματος (Genome Wide Association Studies, GWAS) για συσχέτιση με την BMD (αυχένα κεφαλής μηριαίου οστού, οσφυϊκή μοίρα σπονδυλικής στήλης) σε περισσότερα από 80.000 άτομα του μεγάλου διεθνούς συνασπισμού GEFOS. Στη συνέχεια, στη φάση επικύρωσης της οστικής πλειοτροπίας, οι 72 ισχυρότεροι πολυμορφισμοί από τη φάση ανακάλυψης ελέγχθηκαν για επικύρωση σε περισσότερα από 400.000 άτομα από τη μεγάλη μελέτη UK Biobank.Με αυτόν τον τρόπο εντοπίστηκαν 12 πλειοτροπικοί πολυμορφισμοί που σχετίζονται με την BMD (πτέρνας) και τους ακόλουθους 14 μη οστικούς φαινοτύπους σε επίπεδο p-value<5x10-8: ύψος, περιφέρεια μέσης, νόσος Parkinson, καρκίνος στομάχου μη καρδιακού τύπου, ανδρογενής αλωπεκία, αλλεργικές ασθένειες (άσθμα, πυρετός εκ χόρτου, έκζεμα), ατοπική δερματίτιδα, ατοπία γενικότερα, μαγνήσιο ορού, ηλεκτρολύτες ούρων, πρωτεΐνες ορού, δικτυοερυθροκύτταρα, κατανάλωση καφέ και εκπαιδευτικό επίπεδο. Οι 12 πλειοτροπικοί πολυμορφισμοί βρίσκονταν σε 11 γενετικούς τόπους, σε 8 χρωμοσώματα. Εννέα πολυμορφισμοί περιλαμβάνονταν στον NHGRI-EBI Κατάλογο, ενώ 3 πολυμορφισμοί ήταν γειτονικοί. Συμπερασματικά, η διδακτορική διατριβή ανέδειξε την παρουσία διάφορων φαινοτυπικών συσχετίσεων για την οστεοπόρωση μέσω μελέτης της γενετικής πλειοτροπίας. Αξιοσημείωτα οφέλη από τη διερεύνηση της γενετικής πλειοτροπίας της οστεοπόρωσης αποτελούν οι κλινικές συνέπειες ενσωμάτωσης των μοριακών ανακαλύψεων (υπεύθυνα γονίδια και μονοπάτια) στην αιτιολογία της νόσου. Στόχος είναι οι μελλοντικές προσπάθειες να επικεντρωθούν στην ανάπτυξη νέων φαρμάκων με πλειοτροπικές δράσεις, στην επαναστόχευση των ήδη υπάρχοντων φαρμάκων, καθώς επίσης στην εκτίμηση του κινδύνου ανάπτυξης της νόσου σε άτομα υψηλού κινδύνου.


2021 ◽  
Author(s):  
Giulia Muzio ◽  
Leslie O'Bray ◽  
Laetitia Meng-Papaxanthos ◽  
Juliane Klatt ◽  
Karsten Borgwardt

While the search for associations between genetic markers and complex traits has discovered tens of thousands of trait-related genetic variants, the vast majority of these only explain a tiny fraction of observed phenotypic variation. One possible strategy to detect stronger associations is to aggregate the effects of several genetic markers and to test entire genes, pathways or (sub)networks of genes for association to a phenotype. The latter, network-based genome-wide association studies, in particular suffers from a huge search space and an inherent multiple testing problem. As a consequence, current approaches are either based on greedy feature selection, thereby risking that they miss relevant associations, and/or neglect doing a multiple testing correction, which can lead to an abundance of false positive findings. To address the shortcomings of current approaches of network-based genome-wide association studies, we propose <tt>networkGWAS</tt>, a computationally efficient and statistically sound approach to gene-based genome-wide association studies based on mixed models and neighborhood aggregation. It allows for population structure correction and for well-calibrated p-values, which we obtain through a block permutation scheme. <tt>networkGWAS</tt> successfully detects known or plausible associations on simulated rare variants from H. sapiens data as well as semi-simulated and real data with common variants from A. thaliana and enables the systematic combination of gene-based genome-wide association studies with biological network information.


Sign in / Sign up

Export Citation Format

Share Document