scholarly journals Statistical Correction of the Winner’s Curse Explains Replication Variability in Quantitative Trait Genome-Wide Association Studies

2017 ◽  
Author(s):  
Cameron Palmer ◽  
Itsik Pe’er

AbstractGenome-wide association studies (GWAS) have identified hundreds of SNPs responsible for variation in human quantitative traits. However, genome-wide-significant associations often fail to replicate across independent cohorts, in apparent inconsistency with their apparent strong effects in discovery cohorts. This limited success of replication raises pervasive questions about the utility of the GWAS field. We identify all 332 studies of quantitative traits from the NHGRI-EBI GWAS Database with attempted replication. We find that the majority of studies provide insufficient data to evaluate replication rates. The remaining papers replicate significantly worse than expected (p < 10−14), even when adjusting for regression-to-the-mean of effect size between discovery- and replication-cohorts termed the Winner’s Curse (p < 10−16). We show this is due in part to misreporting replication cohort-size as a maximum number, rather than per-locus one. In 39 studies accurately reporting per-locus cohort-size for attempted replication of 707 loci in samples with similar ancestry, replication rate matched expectation (predicted 458, observed 457, p = 0.94). In contrast, ancestry differences between replication and discovery (13 studies, 385 loci) cause the most highly-powered decile of loci to replicate worse than expected, due to difference in linkage disequilibrium.Author SummaryThe majority of associations between common genetic variation and human traits come from genome-wide association studies, which have analyzed millions of single-nucleotide polymorphisms in millions of samples. These kinds of studies pose serious statistical challenges to discovering new associations. Finite resources restrict the number of candidate associations that can brought forward into validation samples, introducing the need for a significance threshold. This threshold creates a phenomenon called the Winner’s Curse, in which candidate associations close to the discovery threshold are more likely to have biased overestimates of the variant’s true association in the sampled population. We survey all human quantitative trait association studies that validated at least one signal. We find the majority of these studies do not publish sufficient information to actually support their claims of replication. For studies that did, we computationally correct the Winner’s Curse and evaluate replication performance. While all variants combined replicate significantly less than expected, we find that the subset of studies that (1) perform both discovery and replication in samples of the same ancestry; and (2) report accurate per-variant sample sizes, replicate as expected. This study provides strong, rigorous evidence for the broad reliability of genome-wide association studies. We furthermore provide a model for more efficient selection of variants as candidates for replication, as selecting variants using cursed discovery data enriches for variants with little real evidence for trait association.

2019 ◽  
Author(s):  
Jennifer Zou ◽  
Jinjing Zhou ◽  
Sarah Faller ◽  
Robert Brown ◽  
Eleazar Eskin

AbstractGenome-wide association studies (GWAS) have identified thousands of genetic variants associated with complex human traits, but only a fraction of variants identified in discovery studies achieve significance in replication studies. Replication in GWAS studies has been well-studied in the context of winner’s curse, which is the inflation of effect size estimates for significant variants in a study. Multiple methods have been proposed to correct for the effects of winner’s curse. However, winner’s curse is often not sufficient to explain lack of replication. Another reason why studies fail to replicate is that there are fundamental differences between the discovery and replication studies. A confounding factor can create the appearance of a significant finding while actually being an artifact that will not replicate in future studies. We propose a statistical framework that utilizes GWAS replication studies to model winner’s curse and study-specific heterogeneity due to confounders and correct for these effects. We show through simulations and application to 100 human GWAS data sets that modeling both winner’s curse and study-specific heterogeneity explains observed patterns of replication in GWAS studies better than modeling winner’s curse alone.


2016 ◽  
Vol 29 (5) ◽  
pp. 417-430 ◽  
Author(s):  
Firas Talas ◽  
Rasha Kalih ◽  
Thomas Miedaner ◽  
Bruce A. McDonald

Genome-wide association studies can identify novel genomic regions and genes that affect quantitative traits. Fusarium head blight is a destructive disease caused by Fusarium graminearum that exhibits several quantitative traits, including aggressiveness, mycotoxin production, and fungicide resistance. Restriction site–associated DNA sequencing was performed for 220 isolates of F. graminearum. A total of 119 isolates were phenotyped for aggressiveness and deoxynivalenol (DON) production under natural field conditions across four environments. The effective concentration of propiconazole that inhibits isolate growth in vitro by 50% was calculated for 220 strains. Approximately 29,000 single nucleotide polymorphism markers were associated to each trait, resulting in 50, 29, and 74 quantitative trait nucleotides (QTNs) that were significantly associated to aggressiveness, DON production, and propiconazole sensitivity, respectively. Approximately 41% of these QTNs caused nonsynonymous substitutions in predicted exons, while the remainder were synonymous substitutions or located in intergenic regions. Three QTNs associated with propiconazole sensitivity were significant after Bonferroni correction. These QTNs were located in genes not previously associated with azole sensitivity. The majority of the detected QTNs were located in genes with predicted regulatory functions, suggesting that nucleotide variation in regulatory genes plays a major role in the corresponding quantitative trait variation.


2018 ◽  
Author(s):  
Zhou Shaoqun ◽  
Karl A. Kremling ◽  
Bandillo Nonoy ◽  
Richter Annett ◽  
Ying K. Zhang ◽  
...  

One Sentence SummaryHPLC-MS metabolite profiling of maize seedlings, in combination with genome-wide association studies, identifies numerous quantitative trait loci that influence the accumulation of foliar metabolites.AbstractCultivated maize (Zea mays) retains much of the genetic and metabolic diversity of its wild ancestors. Non-targeted HPLC-MS metabolomics using a diverse panel of 264 maize inbred lines identified a bimodal distribution in the prevalence of foliar metabolites. Although 15% of the detected mass features were present in >90% of the inbred lines, the majority were found in <50% of the samples. Whereas leaf bases and tips were differentiated primarily by flavonoid abundance, maize varieties (stiff-stalk, non-stiff-stalk, tropical, sweet corn, and popcorn) were differentiated predominantly by benzoxazinoid metabolites. Genome-wide association studies (GWAS), performed for 3,991 mass features from the leaf tips and leaf bases, showed that 90% have multiple significantly associated loci scattered across the genome. Several quantitative trait locus hotspots in the maize genome regulate the abundance of multiple, often metabolically related mass features. The utility of maize metabolite GWAS was demonstrated by confirming known benzoxazinoid biosynthesis genes, as well as by mapping isomeric variation in the accumulation of phenylpropanoid hydroxycitric acid esters to a single linkage block in a citrate synthase-like gene. Similar to gene expression databases, this metabolomic GWAS dataset constitutes an important public resource for linking maize metabolites with biosynthetic and regulatory genes.


2020 ◽  
Vol 24 ◽  
pp. 100145 ◽  
Author(s):  
Mohsen Mohammadi ◽  
Alencar Xavier ◽  
Travis Beckett ◽  
Savannah Beyer ◽  
Liyang Chen ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document