Fine-mapping genetic associations

Anna Hutchinson; Jennifer Asimit; Chris Wallace

doi:10.1093/hmg/ddaa148

Fine-mapping genetic associations

Human Molecular Genetics ◽

10.1093/hmg/ddaa148 ◽

2020 ◽

Vol 29 (R1) ◽

pp. R81-R88 ◽

Cited By ~ 1

Author(s):

Anna Hutchinson ◽

Jennifer Asimit ◽

Chris Wallace

Keyword(s):

Linkage Disequilibrium ◽

Fine Mapping ◽

Genetic Variants ◽

Functional Annotation ◽

Causal Variant ◽

Genetic Associations ◽

Multiple Datasets ◽

Annotation Data ◽

Causal Variants ◽

Summary Data

Abstract Whilst thousands of genetic variants have been associated with human traits, identifying the subset of those variants that are causal requires a further ‘fine-mapping’ step. We review the basic fine-mapping approach, which is computationally fast and requires only summary data, but depends on an assumption of a single causal variant per associated region which is recognized as biologically unrealistic. We discuss different ways that the approach has been built upon to accommodate multiple causal variants in a region and to incorporate additional layers of functional annotation data. We further review methods for simultaneous fine-mapping of multiple datasets, either exploiting different linkage disequilibrium (LD) structures across ancestries or borrowing information between distinct but related traits. Finally, we look to the future and the opportunities that will be offered by increasingly accurate maps of causal variants for a multitude of human traits.

Download Full-text

Improved methods for multi-trait fine mapping of pleiotropic risk loci

10.1101/054684 ◽

2016 ◽

Author(s):

Gleb Kichaev ◽

Megan Roytman ◽

Ruth Johnson ◽

Eleazar Eskin ◽

Sara Lindstroem ◽

...

Keyword(s):

Fine Mapping ◽

Complex Traits ◽

Functional Annotation ◽

Large Scale ◽

Association Studies ◽

Real Data ◽

Causal Variant ◽

Genome Wide Association Studies ◽

Annotation Data ◽

Association Data

AbstractGenome-wide association studies (GWAS) have identified thousands of regions in the genome that contain genetic variants that increase risk for complex traits and diseases. However, the variants uncovered in GWAS are typically not biologicaly causal, but rather, correlated to the true causal variant through linkage disequilibrium (LD). To discern the true causal variant(s), a variety of statistical fine-mapping methods have been proposed to prioritize variants for functional validation. In this work we introduce a new approach, fastPAINTOR, that leverages evidence across correlated traits, as well as functional annotation data, to improve fine-mapping accuracy at pleiotropic risk loci. To improve computational efficiency, we describe an new importance sampling scheme to perform model inference. First, we demonstrate in simulations that by leveraging functional annotation data, fastPAINTOR increases fine-mapping resolution relative to existing methods. Next, we show that jointly modeling pleiotropic risk regions improves fine-mapping resolution relative to standard single trait and pleiotropic fine mapping strategies. We report a reduction in the number of SNPs required for follow-up in order to capture 90% of the causal variants from 23 SNPs per locus using a single trait to 12 SNPs when fine-mapping two traits simultaneously. Finally, we analyze summary association data from a large-scale GWAS of lipids and show that these improvements are largely sustained in real data.

Download Full-text

Improving the coverage of credible sets in Bayesian genetic fine-mapping

10.1101/781062 ◽

2019 ◽

Cited By ~ 1

Author(s):

Anna Hutchinson ◽

Hope Watson ◽

Chris Wallace

Keyword(s):

Fine Mapping ◽

Genetic Variants ◽

Causal Variant ◽

Human Diseases ◽

Credible Sets ◽

Causal Variants ◽

Credible Set ◽

Genomic Regions

AbstractGenome Wide Association Studies (GWAS) have successfully identified thousands of loci associated with human diseases. Bayesian genetic fine-mapping studies aim to identify the specific causal variants within GWAS loci responsible for each association, reporting credible sets of plausible causal variants, which are interpreted as containing the causal variant with some “coverage probability”.Here, we use simulations to demonstrate that the coverage probabilities are over-conservative in most fine-mapping situations. We show that this is because fine-mapping data sets are not randomly selected from amongst all causal variants, but from amongst causal variants with larger effect sizes. We present a method to re-estimate the coverage of credible sets using rapid simulations based on the observed, or estimated, SNP correlation structure, we call this the “corrected coverage estimate”. This is extended to find “corrected credible sets”, which are the smallest set of variants such that their corrected coverage estimate meets the target coverage.We use our method to improve the resolution of a fine-mapping study of type 1 diabetes. We found that in 27 out of 39 associated genomic regions our method could reduce the number of potentially causal variants to consider for follow-up, and found that none of the 95% or 99% credible sets required the inclusion of more variants – a pattern matched in simulations of well powered GWAS.Crucially, our correction method requires only GWAS summary statistics and remains accurate when SNP correlations are estimated from a large reference panel. Using our method to improve the resolution of fine-mapping studies will enable more efficient expenditure of resources in the follow-up process of annotating the variants in the credible set to determine the implicated genes and pathways in human diseases.Author summaryPinpointing specific genetic variants within the genome that are causal for human diseases is difficult due to complex correlation patterns existing between variants. Consequently, researchers typically prioritise a set of plausible causal variants for functional validation - these sets of putative causal variants are called “credible sets”. We find that the probabilistic interpretation that these credible sets do indeed contain the true causal variant is variable, in that the reported probabilities often underestimate the true coverage of the causal variant in the credible set. We have developed a method to provide researchers with a “corrected coverage estimate” that the true causal variant appears in the credible set, and this has been extended to find “corrected credible sets”, allowing for more efficient allocation of resources in the expensive follow-up laboratory experiments. We used our method to reduce the number of genetic variants to consider as causal candidates for follow-up in 27 genomic regions that are associated with type 1 diabetes.

Download Full-text

Genetic Fine-mapping with Dense Linkage Disequilibrium Blocks: genetics of nicotine dependence

10.1101/2020.12.10.420216 ◽

2020 ◽

Author(s):

Chen Mo ◽

Zhenyao Ye ◽

Kathryn Hatch ◽

Yuan Zhang ◽

Qiong Wu ◽

...

Keyword(s):

Linkage Disequilibrium ◽

Fine Mapping ◽

Association Studies ◽

Causal Variant ◽

Genome Wide Association Studies ◽

Variant Selection ◽

False Positive Error ◽

Graph Norm ◽

Causal Variants ◽

Highly Correlated

AbstractFine-mapping is an analytical step to perform causal prioritization of the polymorphic variants on a trait-associated genomic region observed from genome-wide association studies (GWAS). The prioritization of causal variants can be challenging due to the linkage disequilibrium (LD) patterns among hundreds to thousands of polymorphisms associated with a trait. We propose a novel ℓ0 graph norm shrinkage algorithm to select causal variants from dense LD blocks consisting of highly correlated SNPs that may not be proximal or contiguous. We extract dense LD blocks and perform regression shrinkage to calculate a prioritization score to select a parsimonious set of causal variants. Our approach is computationally efficient and allows performing fine-mapping on thousands of polymorphisms. We demonstrate its application using a large UK Biobank (UKBB) sample related to nicotine addiction. Our results suggest that polymorphic variances in both neighboring and distant variants can be consolidated into dense blocks of highly correlated loci. Simulations were used to evaluate and compare the performance of our method and existing fine-mapping algorithms. The results demonstrated that our method outperformed comparable fine-mapping methods with increased sensitivity and reduced false-positive error rate regarding causal variant selection. The application of this method to smoking severity trait in UKBB sample replicated previously reported loci and suggested the causal prioritization of genetic effects on nicotine dependency.Author summaryDisentangling the complex linkage disequilibrium (LD) pattern and selecting the underlying causal variants have been a long-term challenge for genetic fine-mapping. We find that the LD pattern within GWAS loci is intrinsically organized in delicate graph topological structures, which can be effectively learned by our novel ℓ0 graph norm shrinkage algorithm. The extracted LD graph structure is critical for causal variant selection. Moreover, our method is less constrained by the width of GWAS loci and thus can fine-map a massive number of correlated SNPs.

Download Full-text

Multiple Causal Variants Underlie Genetic Associations in Humans

10.1101/2021.05.24.445471 ◽

2021 ◽

Author(s):

Nathan S Abell ◽

Marianne K DeGorter ◽

Michael Gloudemans ◽

Emily Greenwald ◽

Kevin S Smith ◽

...

Keyword(s):

Genetic Variation ◽

Linkage Disequilibrium ◽

Complex Trait ◽

Causal Variant ◽

Strong Linkage Disequilibrium ◽

Genetic Associations ◽

Crohns Disease ◽

Systematic Assessment ◽

Regulatory Variants ◽

Causal Variants

The majority of associations between genetic variation and human traits and diseases are non-coding and in strong linkage disequilibrium (LD) with surrounding genetic variation. In these cases, a single causal variant is often assumed to underlie the association, however no systematic assessment of the number of causal variants has been performed. In this study, we applied a massively parallel reporter assay (MPRA) in lymphoblastoid cells to functionally evaluate 49,256 allelic pairs, representing 30,893 genetic variants in high, local linkage disequilibrium for 744 independent cis-expression quantitative trait loci (eQTL) and assessed each for colocalization across 114 traits. We identified 3,536 allele-independent regulatory regions containing 907 allele-specific regulatory variants, and found that 17.3% of eQTL contained more than one significant allelic effect. We show that detected regulatory variants are highly and specifically enriched for activating chromatin structures and allelic transcription factor binding, for which ETS-domain family members are a large driver. Integration of MPRA profiles with eQTL/complex trait colocalizations identified causal variant sets for associations with blood cell measurements, Multiple Sclerosis, Irritable Bowel Disease, and Crohns Disease. These results demonstrate that a sizable number of association signals are manifest through multiple, tightly-linked causal variants requiring high-throughput functional assays for fine-mapping.

Download Full-text

CAUSALdb: a database for disease/trait causal variants identified using summary statistics of genome-wide association studies

Nucleic Acids Research ◽

10.1093/nar/gkz1026 ◽

2019 ◽

Cited By ~ 2

Author(s):

Jianhua Wang ◽

Dandan Huang ◽

Yao Zhou ◽

Hongcheng Yao ◽

Huanhuan Liu ◽

...

Keyword(s):

Fine Mapping ◽

Genetic Variants ◽

Association Studies ◽

Complex Trait ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Summary Statistics ◽

Genome Wide ◽

Credible Sets ◽

Causal Variants

Abstract Genome-wide association studies (GWASs) have revolutionized the field of complex trait genetics over the past decade, yet for most of the significant genotype-phenotype associations the true causal variants remain unknown. Identifying and interpreting how causal genetic variants confer disease susceptibility is still a big challenge. Herein we introduce a new database, CAUSALdb, to integrate the most comprehensive GWAS summary statistics to date and identify credible sets of potential causal variants using uniformly processed fine-mapping. The database has six major features: it (i) curates 3052 high-quality, fine-mappable GWAS summary statistics across five human super-populations and 2629 unique traits; (ii) estimates causal probabilities of all genetic variants in GWAS significant loci using three state-of-the-art fine-mapping tools; (iii) maps the reported traits to a powerful ontology MeSH, making it simple for users to browse studies on the trait tree; (iv) incorporates highly interactive Manhattan and LocusZoom-like plots to allow visualization of credible sets in a single web page more efficiently; (v) enables online comparison of causal relations on variant-, gene- and trait-levels among studies with different sample sizes or populations and (vi) offers comprehensive variant annotations by integrating massive base-wise and allele-specific functional annotations. CAUSALdb is freely available at http://mulinlab.org/causaldb.

Download Full-text

The Genetics of Circulating Resistin Level, A Biomarker for Cardiovascular Diseases, Is Informed by Mendelian Randomization and the Unique Characteristics of African Genomes

Circulation Genomic and Precision Medicine ◽

10.1161/circgen.120.002920 ◽

2020 ◽

Author(s):

Karlijn A.C. Meeks ◽

Ayo P. Doumatey ◽

Amy R. Bentley ◽

Mateus H. Gouveia ◽

Guanjie Chen ◽

...

Keyword(s):

Insulin Resistance ◽

Type 2 Diabetes ◽

Fine Mapping ◽

Mendelian Randomization ◽

African Ancestry ◽

Resistance Index ◽

Causal Variant ◽

Transcriptomic Data ◽

Causal Variants

Background - Resistin, a protein linked with inflammation and cardiometabolic diseases, is one of few proteins for which GWAS consistently report variants within and near the coding gene ( RETN ). Here, we took advantage of the reduced linkage disequilibrium in African populations to infer genetic causality for circulating resistin levels by performing GWAS, whole-exome analysis, fine-mapping, Mendelian randomization and transcriptomic data analyses. Methods - GWAS and fine-mapping analyses for resistin were performed in 5621 African ancestry individuals, including 3754 continental Africans (AF) and 1867 African Americans (AA). Causal variants identified were subsequently used as an instrumental variable in Mendelian randomization analyses for homeostatic modelling (HOMA) derived insulin resistance index, BMI and type 2 diabetes. Results - The lead variant (rs3219175, in the promoter region of RETN ) for the single locus detected was the same for AF ( P -value 5.0×10 -111 ) and for AA (9.5×10 -38 ), respectively explaining 12.1% and 8.5% of variance in circulating resistin. Fine-mapping analyses and functional annotation revealed this variant as likely causal affecting circulating resistin levels as a cis -eQTL increasing RETN expression. Additional variants regulating resistin levels were upstream of RETN with genes PCP2 , STXBP2 and XAB2 showing the strongest association using integrative analysis of GWAS with transcriptomic data. Mendelian randomization analyses did not provide evidence for resistin increasing insulin resistance, BMI or type 2 diabetes risk in African-ancestry populations. Conclusions - Taking advantage of the fine-mapping resolution power of African genomes, we identified a single variant (rs3219175) as the likely causal variant responsible for most of the variability in circulating resistin levels. In contrast to findings in some other ancestry populations, we showed that resistin does not seem to increase insulin resistance and related cardiometabolic traits in African-ancestry populations.

Download Full-text

Fine-Mapping Array Design for Multi-Ethnic Studies of Multiple Sclerosis

Genes ◽

10.3390/genes10110903 ◽

2019 ◽

Vol 10 (11) ◽

pp. 903

Author(s):

Ashley H. Beecham ◽

Jacob L. McCauley

Keyword(s):

Multiple Sclerosis ◽

African American ◽

Fine Mapping ◽

Genetic Associations ◽

Genotyping Array ◽

Locus Heterogeneity ◽

Base Content ◽

Risk Variants ◽

Mapping Array ◽

Causal Variants

While approximately 200 autosomal genetic associations outside of the major histocompatibility complex (MHC) have been identified for multiple sclerosis (MS) risk in European populations, causal variants identified at the majority of these associated loci have been much more elusive. We propose that knowledge gained from replication efforts in Hispanic and African American populations can be utilized to more efficiently fine-map these risk loci. To this end, we have customized a genotyping array by adding ~20,000 bead types (~17,000 variants) to the base content of the Ilumina Infinium expanded multi-ethnic genotyping array and the Infinium ImmunoArray-24 v2 BeadChip. These custom bead types were chosen to allow for the detection of causal variation (1) in the presence of allelic and locus heterogeneity, by incorporating regulatory and coding variation within 1-Mb of previously identified risk variants and (2) in the absence of allelic and locus heterogeneity by incorporation of variants using linkage disequilibrium criteria, which are based on knowledge of replication status in Hispanic and African American study samples. This array has been designed to maximize fine-mapping potential for currently identified MS susceptibility loci, particularly in multi-ethnic populations. The strategies described here could be additionally informative for fine-mapping of other disease phenotypes.

Download Full-text

Genetic associations at regulatory phenotypes improve fine-mapping of causal variants for twelve immune-mediated diseases

10.1101/2020.01.15.907436 ◽

2020 ◽

Cited By ~ 4

Author(s):

Kousik Kundu ◽

Alice L. Mann ◽

Manuel Tardaguila ◽

Stephen Watt ◽

Hannes Ponstingl ◽

...

Keyword(s):

Fine Mapping ◽

Molecular Mechanisms ◽

Immune Cell ◽

Meta Analysis ◽

Cell Types ◽

Genetic Associations ◽

Sequencing Data ◽

Immune Mediated ◽

Causal Genes ◽

Causal Variants

AbstractThe identification of causal genetic variants for common diseases improves understanding of disease biology. Here we use data from the BLUEPRINT project to identify regulatory quantitative trait loci (QTL) for three primary human immune cell types and use these to fine-map putative causal variants for twelve immune-mediated diseases. We identify 340 unique, non major histocompatibility complex (MHC) disease loci that colocalise with high (>98%) posterior probability with regulatory QTLs, and apply Bayesian frameworks to fine-map associations at each locus. We show that fine-mapping applied to regulatory QTLs yields smaller credible set sizes and higher posterior probabilities for candidate causal variants compared to disease summary statistics. We also describe a systematic under-representation of insertion/deletion (INDEL) polymorphisms in credible sets derived from publicly available disease meta-analysis when compared to QTLs based on genome-sequencing data. Overall, our findings suggest that fine-mapping applied to disease-colocalising regulatory QTLs can enhance the discovery of putative causal disease variants and provide insights into the underlying causal genes and molecular mechanisms.

Download Full-text

Fine-mapping cellular QTLs with RASQUAL and ATAC-seq

10.1101/018788 ◽

2015 ◽

Cited By ~ 3

Author(s):

Natsuhiko Kumasaka ◽

Andrew Knights ◽

Daniel Gaffney

Keyword(s):

Fine Mapping ◽

Population Level ◽

Regulatory Elements ◽

Causal Variant ◽

European Population ◽

Open Chromatin ◽

Gene Promoters ◽

Genetic Associations ◽

Functional Interpretation ◽

Allele Specific

When cellular traits are measured using high-throughput DNA sequencing quantitative trait loci (QTLs) manifest at two levels: population level differences between individuals and allelic differences between cis-haplotypes within individuals. We present RASQUAL (Robust Allele Specific QUAntitation and quality controL), a novel statistical approach for association mapping that integrates genetic effects and robust modelling of biases in next generation sequencing (NGS) data within a single, probabilistic framework. RASQUAL substantially improves causal variant localisation and sensitivity of association detection over existing methods in RNA-seq, DNaseI-seq and ChIP-seq data. We illustrate how RASQUAL can be used to maximise association detection by generating the first map of chromatin accessibility QTLs (caQTLs) in a European population using ATAC-seq. Despite a modest sample size, we identified 2,706 independent caQTLs (FDR 10%) and illustrate how RASQUAL's improved causal variant localisation provides powerful information for fine-mapping disease-associated variants. We also map “multipeak” caQTLs, identical genetic associations found across multiple, independent open chromatin regions and illustrate how genetic signals in ATAC-seq data can be used to link distal regulatory elements with gene promoters. Our results highlight how joint modelling of population and allele-specific genetic signals can improve functional interpretation of noncoding variation.

Download Full-text

Leveraging Functional-Annotation Data in Trans-ethnic Fine-Mapping Studies

The American Journal of Human Genetics ◽

10.1016/j.ajhg.2015.07.010 ◽

2015 ◽

Vol 97 (2) ◽

pp. 353

Author(s):

Gleb Kichaev ◽

Bogdan Pasaniuc

Keyword(s):

Fine Mapping ◽

Functional Annotation ◽

Annotation Data ◽

In Trans

Download Full-text