scholarly journals A scalable Bayesian method for integrating functional information in genome-wide association studies

2017 ◽  
Author(s):  
Jingjing Yang ◽  
Lars G. Fritsche ◽  
Xiang Zhou ◽  
Gonçalo Abecasis ◽  

AbstractAlthough genome-wide association studies (GWASs) have identified many risk loci for complex traits and common diseases, most of the identified associations reside in noncoding regions and have unknown biological functions. Recent genomic sequencing studies have produced a rich resource of annotations that help characterize the function of genetic variants. Integrative analysis that incorporates these functional annotations into GWAS can help elucidate the biological mechanisms underlying the identified associations and help prioritize causal-variants. Here, we develop a novel, flexible Bayesian variable selection model with efficient computational techniques for such integrative analysis. Different from previous approaches, our method models the effect-size distribution and probability of causality for variants with different annotations and jointly models genome-wide variants to account for linkage disequilibrium (LD), thus prioritizing associations based on the quantification of the annotations and allowing for multiple causal-variants per locus. Our efficient computational algorithm dramatically improves both computational speed and posterior sampling convergence by taking advantage of the block-wise LD structures of human genomes. With simulations, we show that our method accurately quantifies the functional enrichment and performs more powerful for identifying true causal-variants than several competing methods. The power gain brought up by our method is especially apparent in cases when multiple causal-variants in LD reside in the same locus. We also apply our method for an in-depth GWAS of age-related macular degeneration with 33,976 individuals and 9,857,286 variants. We find the strongest enrichment for causality among non-synonymous variants (54x more likely to be causal, 1.4x larger effect-sizes) and variants in active promoter (7.8x more likely, 1.4x larger effect-sizes), as well as identify 5 potentially novel loci in addition to the 32 known AMD risk loci. In conclusion, our method is shown to efficiently integrate functional information in GWASs, helping identify causal variants and underlying biology.Author summaryWe propose a novel Bayesian hierarchical model to account for linkage disequilibrium (LD) and multiple functional annotations in GWAS, paired with an expectation-maximization Markov chain Monte Carlo (EM-MCMC) computational algorithm to jointly analyze genome-wide variants. Our method improves the MCMC convergence property to ensure accurate Bayesian inference of the quantifications of the functional enrichment pattern and fine-mapped association results. By applying our method to the real GWAS of age-related macular degeneration (AMD) with various functional annotations (i.e., gene-based, regulatory, and chromatin states), we find that the variants of non-synonymous, coding, and active promoter annotations have the highest causal probability and the largest effect-sizes. In addition, our method produces fine-mapped association results in the identified risk loci, two of which are shown as examples (C2/CFB/SKIV2L and C3) with justifications by haplotype analysis, model comparison, and conditional analysis. Therefore, we believe our integrative method will be useful for quantifying the enrichment pattern of functional annotations in GWAS, and then prioritizing associations with respect to the learned functional enrichment pattern.

2021 ◽  
Author(s):  
Dustin Griesemer ◽  
James R Xue ◽  
Steven K Reilly ◽  
Jacob C Ulirsch ◽  
Kalki Kukreja ◽  
...  

Abstract3’ untranslated region (3’UTR) variants are strongly associated with human traits and diseases, yet few have been causally identified. We developed the Massively Parallel Reporter Assay for 3’UTRs (MPRAu) to sensitively assay 12,173 3’UTR variants. We applied MPRAu to six human cell lines, focusing on genetic variants associated with genome-wide association studies (GWAS) and human evolutionary adaptation. MPRAu expands our understanding of 3’UTR function, suggesting that low-complexity sequences predominately explain 3’UTR regulatory activity. We adapt MPRAu to uncover diverse molecular mechanisms at base-pair resolution, including an AU-rich element of LEPR linked to potential metabolic evolutionary adaptations in East Asians. We nominate hundreds of 3’UTR causal variants with genetically fine-mapped phenotype associations. Using endogenous allelic replacements, we characterize one variant that disrupts a miRNA site regulating the viral defense gene TRIM14, and one that alters PILRB abundance, nominating a causal variant underlying transcriptional changes in age-related macular degeneration.


2020 ◽  
Vol 36 (18) ◽  
pp. 4749-4756 ◽  
Author(s):  
Alexey A Shadrin ◽  
Oleksandr Frei ◽  
Olav B Smeland ◽  
Francesco Bettella ◽  
Kevin S O'Connell ◽  
...  

Abstract Motivation Determining the relative contributions of functional genetic categories is fundamental to understanding the genetic etiology of complex human traits and diseases. Here, we present Annotation Informed-MiXeR, a likelihood-based method for estimating the number of variants influencing a phenotype and their effect sizes across different functional annotation categories of the genome using summary statistics from genome-wide association studies. Results Extensive simulations demonstrate that the model is valid for a broad range of genetic architectures. The model suggests that complex human phenotypes substantially differ in the number of causal variants, their localization in the genome and their effect sizes. Specifically, the exons of protein-coding genes harbor more than 90% of variants influencing type 2 diabetes and inflammatory bowel disease, making them good candidates for whole-exome studies. In contrast, <10% of the causal variants for schizophrenia, bipolar disorder and attention-deficit/hyperactivity disorder are located in protein-coding exons, indicating a more substantial role of regulatory mechanisms in the pathogenesis of these disorders. Availability and implementation The software is available at: https://github.com/precimed/mixer. Supplementary information Supplementary data are available at Bioinformatics online.


2021 ◽  
Author(s):  
Wenmin Zhang ◽  
Hamed S Najafabadi ◽  
Yue Li

Identifying causal variants from genome-wide association studies (GWASs) is challenging due to widespread linkage disequilibrium (LD). Functional annotations of the genome may help prioritize variants that are biologically relevant and thus improve fine-mapping of GWAS results. However, classical fine-mapping methods have a high computational cost, particularly when the underlying genetic architecture and LD patterns are complex. Here, we propose a novel approach, SparsePro, to efficiently conduct functionally informed statistical fine-mapping. Our method enjoys two major innovations: First, by creating a sparse low-dimensional projection of the high-dimensional genotype, we enable a linear search of causal variants instead of an exponential search of causal configurations used in existing methods; Second, we adopt a probabilistic framework with a highly efficient variational expectation-maximization algorithm to integrate statistical associations and functional priors. We evaluate SparsePro through extensive simulations using resources from the UK Biobank. Compared to state-of-the-art methods, SparsePro achieved more accurate and well-calibrated posterior inference with greatly reduced computation time. We demonstrate the utility of SparsePro by investigating the genetic architecture of five functional biomarkers of vital organs. We identify potential causal variants contributing to the genetically encoded coordination mechanisms between vital organs and pinpoint target genes with potential pleiotropic effects. In summary, we have developed an efficient genome-wide fine-mapping method with the ability to integrate functional annotations. Our method may have wide utility in understanding the genetics of complex traits as well as in increasing the yield of functional follow-up studies of GWASs.


2011 ◽  
Vol 04 (02) ◽  
pp. 119
Author(s):  
Mohammad Othman ◽  
Kari Branham ◽  
John R Heckenlively ◽  
◽  
◽  
...  

Age-related macular degeneration (AMD) is the main cause of vision loss and impairment in the aging population in developed countries. It is clinically and genetically a complex disease with both environmental and genetic factors affecting the outcome of the disease. Other than the wet type of AMD, there is no treatment for the other forms of AMD. It is estimated that the number of AMD patients will double in the next decade, which will have a significant financial impact on the health system and will compete for health dollars. Understanding the role of genetics in the development of AMD is paramount to help with diagnosis and future treatment. Over the past few years, we have studied the genetics of AMD and reported modest to significant association between AMD and several genes including CFH, ARMS2, TLR4 and ApoE. Our recent genome-wide association studies confirmed these AMD susceptibility loci in addition to other genes in the complement system (C2, C3, CFB and CFI). Recent studies identified new loci near TIMP3 and HDL influencing susceptibility to AMD.


Cells ◽  
2020 ◽  
Vol 9 (10) ◽  
pp. 2267
Author(s):  
Tobias Strunz ◽  
Christina Kiel ◽  
Bastian L. Sauerbeck ◽  
Bernhard H. F. Weber

Over the last 15 years, genome-wide association studies (GWAS) have greatly advanced our understanding of the genetic landscape of complex phenotypes. Nevertheless, causal interpretations of GWAS data are challenging but crucial to understand underlying mechanisms and pathologies. In this review, we explore to what extend the research community follows up on GWAS data. We have traced the scientific activities responding to the two largest GWAS conducted on age-related macular degeneration (AMD) so far. Altogether 703 articles were manually categorized according to their study type. This demonstrates that follow-up studies mainly involve “Review articles” (33%) or “Genetic association studies” (33%), while 19% of publications report on findings from experimental work. It is striking to note that only three of 16 AMD-associated loci described de novo in 2016 were examined in the four-year follow-up period after publication. A comparative analysis of five studies on gene expression regulation in AMD-associated loci revealed consistent gene candidates for 15 of these loci. Our random survey highlights the fact that functional follow-up studies on GWAS results are still in its early stages hampering a significant refinement of the vast association data and thus a more accurate insight into mechanisms and pathways.


Sign in / Sign up

Export Citation Format

Share Document