scholarly journals Functionally-informed fine-mapping and polygenic localization of complex trait heritability

2019 ◽  
Author(s):  
Omer Weissbrod ◽  
Farhad Hormozdiari ◽  
Christian Benner ◽  
Ran Cui ◽  
Jacob Ulirsch ◽  
...  

AbstractFine-mapping aims to identify causal variants impacting complex traits. Several recent methods improve fine-mapping accuracy by prioritizing variants in enriched functional annotations. However, these methods can only use information at genome-wide significant loci (or a small number of functional annotations), severely limiting the benefit of functional data. We propose PolyFun, a computationally scalable framework to improve fine-mapping accuracy using genome-wide functional data for a broad set of coding, conserved, regulatory and LD-related annotations. PolyFun prioritizes variants in enriched functional annotations by specifying prior causal probabilities for fine-mapping methods such as SuSiE or FINEMAP, employing special procedures to ensure robustness to model misspecification and winner’s curse. In simulations with in-sample LD, PolyFun + SuSiE and PolyFun + FINEMAP were well-calibrated and identified >20% more variants with posterior causal probability >0.95 than their non-functionally informed counterparts (and >33% more fine-mapped variants than previous functionally-informed fine-mapping methods). In simulations with mismatched reference LD, PolyFun + SuSiE remained well-calibrated when reducing the maximum number of assumed causal SNPs per locus, which reduces absolute power but still produces large relative improvements. In analyses of 49 UK Biobank traits (average N=318K) with in-sample LD, PolyFun + SuSiE identified 3,025 fine-mapped variant-trait pairs with posterior causal probability >0.95, a >32% improvement vs. SuSiE; 223 variants were fine-mapped for multiple genetically uncorrelated traits, indicating pervasive pleiotropy. We used posterior mean per-SNP heritabilities from PolyFun + SuSiE to perform polygenic localization, constructing minimal sets of common SNPs causally explaining 50% of common SNP heritability; these sets ranged in size from 28 (hair color) to 3,400 (height) to 2 million (number of children). In conclusion, PolyFun prioritizes variants for functional follow-up and provides insights into complex trait architectures.

2021 ◽  
Author(s):  
Wenmin Zhang ◽  
Hamed S Najafabadi ◽  
Yue Li

Identifying causal variants from genome-wide association studies (GWASs) is challenging due to widespread linkage disequilibrium (LD). Functional annotations of the genome may help prioritize variants that are biologically relevant and thus improve fine-mapping of GWAS results. However, classical fine-mapping methods have a high computational cost, particularly when the underlying genetic architecture and LD patterns are complex. Here, we propose a novel approach, SparsePro, to efficiently conduct functionally informed statistical fine-mapping. Our method enjoys two major innovations: First, by creating a sparse low-dimensional projection of the high-dimensional genotype, we enable a linear search of causal variants instead of an exponential search of causal configurations used in existing methods; Second, we adopt a probabilistic framework with a highly efficient variational expectation-maximization algorithm to integrate statistical associations and functional priors. We evaluate SparsePro through extensive simulations using resources from the UK Biobank. Compared to state-of-the-art methods, SparsePro achieved more accurate and well-calibrated posterior inference with greatly reduced computation time. We demonstrate the utility of SparsePro by investigating the genetic architecture of five functional biomarkers of vital organs. We identify potential causal variants contributing to the genetically encoded coordination mechanisms between vital organs and pinpoint target genes with potential pleiotropic effects. In summary, we have developed an efficient genome-wide fine-mapping method with the ability to integrate functional annotations. Our method may have wide utility in understanding the genetics of complex traits as well as in increasing the yield of functional follow-up studies of GWASs.


2022 ◽  
Author(s):  
Wenmin Zhang ◽  
Hamed Najafabadi ◽  
Yue Li

Abstract Identifying causal variants from genome-wide association studies (GWASs) is challenging due to widespread linkage disequilibrium (LD). Functional annotations of the genome may help prioritize variants that are biologically relevant and thus improve fine-mapping of GWAS results. However, classical fine-mapping methods have a high computational cost, particularly when the underlying genetic architecture and LD patterns are complex. Here, we propose a novel approach, SparsePro, to efficiently conduct genome-wide fine-mapping. Our method enjoys two major innovations: First, by creating a sparse low-dimensional projection of the high-dimensional genotype data, we enable a linear search of causal variants instead of a combinatorial search of causal configurations used in most existing methods; Second, we adopt a probabilistic framework with a highly efficient variational expectation-maximization algorithm to integrate statistical associations and functional priors. We evaluate SparsePro through extensive simulations using resources from the UK Biobank. Compared to state-of-the-art methods, SparsePro achieved more accurate and well-calibrated posterior inference with greatly reduced computation time. We demonstrate the utility of SparsePro by investigating the genetic architecture of five functional biomarkers of vital organs. We show that, compared to other methods, the causal variants identified by SparsePro are highly enriched for expression quantitative trait loci and explain a larger proportion of trait heritability. We also identify potential causal variants contributing to the genetically encoded coordination mechanisms between vital organs, and pinpoint target genes with potential pleiotropic effects. In summary, we have developed an efficient genome-wide fine-mapping method with the ability to integrate functional annotations. Our method may have wide utility in understanding the genetics of complex traits as well as in increasing the yield of functional follow-up studies of GWASs. SparsePro software is available on GitHub at https://github.com/zhwm/SparsePro.


Author(s):  
Jianhua Wang ◽  
Dandan Huang ◽  
Yao Zhou ◽  
Hongcheng Yao ◽  
Huanhuan Liu ◽  
...  

Abstract Genome-wide association studies (GWASs) have revolutionized the field of complex trait genetics over the past decade, yet for most of the significant genotype-phenotype associations the true causal variants remain unknown. Identifying and interpreting how causal genetic variants confer disease susceptibility is still a big challenge. Herein we introduce a new database, CAUSALdb, to integrate the most comprehensive GWAS summary statistics to date and identify credible sets of potential causal variants using uniformly processed fine-mapping. The database has six major features: it (i) curates 3052 high-quality, fine-mappable GWAS summary statistics across five human super-populations and 2629 unique traits; (ii) estimates causal probabilities of all genetic variants in GWAS significant loci using three state-of-the-art fine-mapping tools; (iii) maps the reported traits to a powerful ontology MeSH, making it simple for users to browse studies on the trait tree; (iv) incorporates highly interactive Manhattan and LocusZoom-like plots to allow visualization of credible sets in a single web page more efficiently; (v) enables online comparison of causal relations on variant-, gene- and trait-levels among studies with different sample sizes or populations and (vi) offers comprehensive variant annotations by integrating massive base-wise and allele-specific functional annotations. CAUSALdb is freely available at http://mulinlab.org/causaldb.


2014 ◽  
Author(s):  
Xiaoquan Wen ◽  
Francesca Luca ◽  
Roger Pique-Regi

Mapping expression quantitative trait loci (eQTLs) has been shown as a powerful tool to uncover the genetic underpinnings of many complex traits at the molecular level. In this paper, we present an integrative analysis approach that leverages eQTL data collected from multiple population groups. In particular, our approach effectively identifies multiple independent {\it cis}-eQTL signals that are consistently presented across populations, accounting for heterogeneity in allele frequencies and patterns of linkage disequilibrium. Furthermore, our analysis framework enables integrating high-resolution functional annotations into analysis of eQTLs. We applied our statistical approach to analyze the GEUVADIS data consisting of samples from five population groups. From this analysis, we concluded that i) joint analysis across population groups greatly improves the power of eQTL discovery and the resolution of fine mapping of causal eQTLs; ii) many genes harbor multiple independent eQTLs in their {\it cis} regions; iii) genetic variants that disrupt transcription factor binding are significantly enriched in eQTLs (p-value = 4.93 × 10-22).


2021 ◽  
Author(s):  
Duncan S Palmer ◽  
Wei Zhou ◽  
Liam Abbott ◽  
Nik Baya ◽  
Claire Churchhouse ◽  
...  

In classical statistical genetic theory, a dominance effect is defined as the deviation from a purely additive genetic effect for a biallelic variant. Dominance effects are well documented in model organisms. However, evidence in humans is limited to a handful of traits, particularly those with strong single locus effects such as hair color. We carried out the largest systematic evaluation of dominance effects on phenotypic variance in the UK Biobank. We curated and tested over 1,000 phenotypes for dominance effects through GWAS scans, identifying 175 loci at genome-wide significance correcting for multiple testing (P < 4.7 × 10-11). Power to detect non-additive loci is much lower than power to detect additive effects for complex traits: based on the relative effect sizes at genome-wide significant additive loci, we estimate a factor of 20-30 increase in sample size will be necessary to capture clear evidence of dominance similar to those currently observed for additive effects. However, these localised dominance hits do not extend to a significant aggregate contribution to phenotypic variance genome-wide. By deriving a version of LD-score regression to detect dominance effects tagged by common variation genome-wide (minor allele frequency > 0.05), we found no strong evidence of a contribution to phenotypic variance when accounting for multiple testing. Across the 267 continuous and 793 binary traits the median contribution was 5.73 × 10-4, with unbiased point estimates ranging from -0.261 to 0.131. Finally, we introduce dominance fine-mapping to explore whether the more rapid decay of dominance LD can be leveraged to find causal variants. These results provide the most comprehensive assessment of dominance trait variation in humans to date.


2021 ◽  
Author(s):  
Davide Marnetto ◽  
Vasili Pankratov ◽  
Mayukh Mondal ◽  
Francesco Montinaro ◽  
Katri Pärna ◽  
...  

The contemporary European genetic makeup formed in the last 8000 years as the combination of three main genetic components: the local Western Hunter-Gatherers, the incoming Neolithic Farmers from Anatolia and the Bronze Age component from the Pontic Steppes. When meeting into the post-Neolithic European environment, the genetic variants accumulated during their three distinct evolutionary histories mixed and came into contact with new environmental challenges. Here we investigate how this genetic legacy reflects on the complex trait landscape of contemporary European populations, using the Estonian Biobank as a case study. For the first time we directly connect the phenotypic information available from biobank samples with the genetic similarity to these ancestral groups, both at a genome-wide level and focusing on genomic regions associated with each of the 27 complex traits we investigated. We also found SNPs connected to pigmentation, cholesterol, sleep, diastolic blood pressure, and body mass index (BMI) to show signals of selection following the post Neolithic admixture events. We recapitulate existing knowledge about pigmentation traits, corroborate the connection between Steppe ancestry and height and highlight novel associations. Among others, we report the contribution of Hunter Gatherer ancestry towards high BMI and low blood cholesterol levels. Our results show that the ancient components that form the contemporary European genome were differentiated enough to contribute ancestry-specific signatures to the phenotypic variability displayed by contemporary individuals in at least 11 out of 27 of the complex traits investigated here.


2021 ◽  
Author(s):  
Steven Gazal ◽  
Omer Weissbrod ◽  
Farhad Hormozdiari ◽  
Kushal Dey ◽  
Joseph Nasser ◽  
...  

Although genome-wide association studies (GWAS) have identified thousands of disease-associated common SNPs, these SNPs generally do not implicate the underlying target genes, as most disease SNPs are regulatory. Many SNP-to-gene (S2G) linking strategies have been developed to link regulatory SNPs to the genes that they regulate in cis, but it is unclear how these strategies should be applied in the context of interpreting common disease risk variants. We developed a framework for evaluating and combining different S2G strategies to optimize their informativeness for common disease risk, leveraging polygenic analyses of disease heritability to define and estimate their precision and recall. We applied our framework to GWAS summary statistics for 63 diseases and complex traits (average N=314K), evaluating 50 S2G strategies. Our optimal combined S2G strategy (cS2G) included 7 constituent S2G strategies (Exon, Promoter, 2 fine-mapped cis-eQTL strategies, EpiMap enhancer-gene linking, Activity-By-Contact (ABC), and Cicero), and achieved a precision of 0.75 and a recall of 0.33, more than doubling the precision and/or recall of any individual strategy; this implies that 33% of SNP-heritability can be linked to causal genes with 75% confidence. We applied cS2G to fine-mapping results for 49 UK Biobank diseases/traits to predict 7,111 causal SNP-gene-disease triplets (with S2G-derived functional interpretation) with high confidence. Finally, we applied cS2G to genome-wide fine-mapping results for these traits (not restricted to GWAS loci) to rank genes by the heritability linked to each gene, providing an empirical assessment of disease omnigenicity; averaging across traits, we determined that the top 200 (1%) of ranked genes explained roughly half of the heritability linked to all genes. Our results highlight the benefits of our cS2G strategy in providing functional interpretation of GWAS findings; we anticipate that precision and recall will increase further under our framework as improved functional assays lead to improved S2G strategies. 


2021 ◽  
Author(s):  
Xing Wu ◽  
Wei Jiang ◽  
Christopher Fragoso ◽  
Jing Huang ◽  
Geyu Zhou ◽  
...  

Genome wide association studies (GWAS) can play an essential role in understanding genetic basis of complex traits in plants and animals. Conventional SNP-based linear mixed models (LMM) used in many GWAS that marginally test single nucleotide polymorphisms (SNPs) have successfully identified many loci with major and minor effects. In plants, the relatively small population size in GWAS and the high genetic diversity found many plant species can impede mapping efforts on complex traits. Here we present a novel haplotype-based trait fine-mapping framework, HapFM, to supplement current GWAS methods. HapFM uses genotype data to partition the genome into haplotype blocks, identifies haplotype clusters within each block, and then performs genome-wide haplotype fine-mapping to infer the causal haplotype blocks of trait. We benchmarked HapFM, GEMMA, BSLMM, and GMMAT in both simulation and real plant GWAS datasets. HapFM consistently resulted in higher mapping power than the other GWAS methods in simulations with high polygenicity. Moreover, it resulted in higher mapping resolution, especially in regions of high LD, by identifying small causal blocks in the larger haplotype block. In the Arabidopsis flowering time (FT10) datasets, HapFM identified four novel loci compared to GEMMA results, and its average mapping interval of HapFM was 9.6 times smaller than that of GEMMA. In conclusion, HapFM is tailored for plant GWAS to result in high mapping power on complex traits and improved mapping resolution to facilitate crop improvement.


2021 ◽  
Author(s):  
Brian C Zhang ◽  
Arjun Biddanda ◽  
Pier Francesco Palamara

Accurate inference of gene genealogies from genetic data has the potential to facilitate a wide range of analyses. We introduce a method for accurately inferring biobank-scale genome-wide genealogies from sequencing or genotyping array data, as well as strategies to utilize genealogies within linear mixed models to perform association and other complex trait analyses. We use these new methods to build genome-wide genealogies using genotyping data for 337,464 UK Biobank individuals and to detect associations in 7 complex traits. Genealogy-based association detects more rare and ultra-rare signals (N = 133, frequency range 0.0004% - 0.1%) than genotype imputation from ~65,000 sequenced haplotypes (N = 65). In a subset of 138,039 exome sequencing samples, these associations strongly tag (average r = 0.72) underlying sequencing variants, which are enriched for missense (2.3×) and loss-of-function (4.5×) variation. Inferred genealogies also capture additional association signals in higher frequency variants. These results demonstrate that large-scale inference of gene genealogies may be leveraged in the analysis of complex traits, complementing approaches that require the availability of large, population-specific sequencing panels.


Sign in / Sign up

Export Citation Format

Share Document