scholarly journals eMAGMA: An eQTL-informed method to identify risk genes using genome-wide association study summary statistics

2019 ◽  
Author(s):  
Zachary F Gerring ◽  
Angela Mina-Vargas ◽  
Eske M Derks

AbstractIdentifying genes underlying genetic associations of complex disease is challenging because most common risk variants reside in non-protein coding regions of the genome and likely alter the expression of target genes by disrupting tissue and cell-type specific regulatory elements. To address this challenge, we developed a methodological framework, eQTL-MAGMA (eMAGMA), that converts SNP-level summary statistics into gene-level association statistics by assigning non-coding SNPs to their putative genes based on tissue-specific eQTL information. We compared eMAGMA to three eQTL informed gene-based approaches—S-PrediXcan, FUSION, and SMR—using simulated phenotype data. Phenotypes were simulated based on eQTL reference data using GCTA for all genes with at least one eQTL at chromosome 1 (651 genes). We performed 10 simulations per gene. The eQTL-h2 (i.e., the proportion of variation explained by the eQTLs was set at 1%, 2%, and 5%. We found eMAGMA outperforms other gene-based approaches across a range of simulated parameters (e.g. the number of identified causal genes). When applied to genome-wide association summary statistics for major depression, eMAGMA identified substantially more putative candidate causal genes compared to other eQTL-based approaches. By integrating tissue-specific eQTL information, these results show eMAGMA will help to identify novel candidate causal genes from genome-wide association summary statistics and thereby improve the understanding of the biological basis of complex disorders.

Author(s):  
Zachary F Gerring ◽  
Angela Mina-Vargas ◽  
Eric R Gamazon ◽  
Eske M Derks

Abstract Motivation Genome-wide association studies have successfully identified multiple independent genetic loci that harbour variants associated with human traits and diseases, but the exact causal genes are largely unknown. Common genetic risk variants are enriched in non-protein-coding regions of the genome and often affect gene expression (expression quantitative trait loci, eQTL) in a tissue-specific manner. To address this challenge, we developed a methodological framework, E-MAGMA, which converts genome-wide association summary statistics into gene-level statistics by assigning risk variants to their putative genes based on tissue-specific eQTL information. Results We compared E-MAGMA to three eQTL informed gene-based approaches using simulated phenotype data. Phenotypes were simulated based on eQTL reference data using GCTA for all genes with at least one eQTL at chromosome 1. We performed 10 simulations per gene. The eQTL-h2 (i.e., the proportion of variation explained by the eQTLs) was set at 1%, 2%, and 5%. We found E-MAGMA outperforms other gene-based approaches across a range of simulated parameters (e.g. the number of identified causal genes). When applied to genome-wide association summary statistics for five neuropsychiatric disorders, E-MAGMA identified more putative candidate causal genes compared to other eQTL-based approaches. By integrating tissue-specific eQTL information, these results show E-MAGMA will help to identify novel candidate causal genes from genome-wide association summary statistics and thereby improve the understanding of the biological basis of complex disorders. Availability A tutorial and input files are made available in a github repository: https://github.com/eskederks/eMAGMA-tutorial. Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
V. E. Golimbet ◽  
A. K. Golov ◽  
N. V. Kondratyev

Genome-wide association studies (GWASs) discovered multiple genetic variants associated with schizophrenia. Te next step (post-GWAS analysis) is aimed at identifying the causal genetic variants and biological mechanisms underlying the associations with disease risk. Te following strategies are considered: the study of transcriptional regulation in neuronal human cells and the use of epigenomic information for searching for regulatory elements involved in the pathogenesis of schizophrenia. Te frst strategy includes identifcation of neuronal enhancers, mapping of potential target genes and functional confrmation of enhancer-promoter interactions. Te second approach is focused on the identifcation of transcriptional factors, which appear to be master regulators of expression.


2008 ◽  
Vol 54 (7) ◽  
pp. 1116-1124 ◽  
Author(s):  
Struan F A Grant ◽  
Hakon Hakonarson

Abstract Background: There is a revolution occurring in single nucleotide polymorphism (SNP) genotyping technology, with high-throughput methods now allowing large numbers of SNPs (105–106) to be genotyped in large cohort studies. This has enabled large-scale genome-wide association (GWA) studies in complex diseases, such as diabetes, asthma, and inflammatory bowel disease, to be undertaken for the first time. Content: The GWA approach serves the critical need for a comprehensive and unbiased strategy to identify causal genes related to complex disease, and is rapidly replacing the more traditional candidate gene studies and microsatellite-based linkage mapping approaches that have dominated gene discovery attempts for common diseases. As a consequence of employing array-based technologies, over the last 3 years dramatic discoveries of key variants involved in multiple complex diseases and related traits have been reported in the top scientific literature and, most importantly, have been largely replicated by independent investigator groups. As a consequence, several novel genes have been identified, most notably in the metabolic, cardiovascular, autoimmune, and oncology disease areas, that are clearly rooted in the biology of these disorders. These discoveries have opened up new avenues for investigators to address novel molecular pathways that were not previously linked to or thought of in relation with these diseases. Summary: This review provides a synopsis of recent advances and what we may expect to still emerge from this field.


2020 ◽  
Vol 127 (1) ◽  
pp. 34-50 ◽  
Author(s):  
Antoinette F. van Ouwerkerk ◽  
Amelia W. Hall ◽  
Zachary A. Kadow ◽  
Sonja Lazarevic ◽  
Jasmeet S. Reyat ◽  
...  

Genome-wide association studies have uncovered over a 100 genetic loci associated with atrial fibrillation (AF), the most common arrhythmia. Many of the top AF-associated loci harbor key cardiac transcription factors, including PITX2, TBX5, PRRX1, and ZFHX3. Moreover, the vast majority of the AF-associated variants lie within noncoding regions of the genome where causal variants affect gene expression by altering the activity of transcription factors and the epigenetic state of chromatin. In this review, we discuss a transcriptional regulatory network model for AF defined by effector genes in Genome-wide association studies loci. We describe the current state of the field regarding the identification and function of AF-relevant gene regulatory networks, including variant regulatory elements, dose-sensitive transcription factor functionality, target genes, and epigenetic states. We illustrate how altered transcriptional networks may impact cardiomyocyte function and ionic currents that impact AF risk. Last, we identify the need for improved tools to identify and functionally test transcriptional components to define the links between genetic variation, epigenetic gene regulation, and atrial function.


2021 ◽  
Author(s):  
Ying Ji ◽  
Qiang Wei ◽  
Rui Chen ◽  
Quan Wang ◽  
Ran Tao ◽  
...  

AbstractA common strategy for the functional interpretation of genome-wide association study (GWAS) findings has been the integrative analysis of GWAS and expression data. Using this strategy, many association methods (e.g., PrediXcan and FUSION) have been successful in identifying trait-associated genes via mediating effects on RNA expression. However, these approaches often ignore the effects of splicing, which carries as much disease risk as expression. Compared to expression data, one challenge to detect associations using splicing data is the large multiple testing burden due to multidimensional splicing events within genes. Here, we introduce a multidimensional splicing gene (MSG) approach, which consists of two stages: 1) we use sparse canonical correlation analysis (sCCA) to construct latent canonical vectors (CVs) by identifying sparse linear combinations of genetic variants and splicing events that are maximally correlated with each other; and 2) we test for the association between the genetically regulated splicing CVs and the trait of interest using GWAS summary statistics. Simulations show that MSG has proper type I error control and substantial power gains over existing multidimensional expression analysis methods (i.e., S-MultiXcan, UTMOST, and sCCA+ACAT) under diverse scenarios. When applied to the Genotype-Tissue Expression Project data and GWAS summary statistics of 14 complex human traits, MSG identified on average 83%, 115%, and 223% more significant genes than sCCA+ACAT, S-MultiXcan, and UTMOST, respectively. We highlight MSG’s applications to Alzheimer’s disease, low-density lipoprotein cholesterol, and schizophrenia, and found that the majority of MSG-identified genes would have been missed from expression-based analyses. Our results demonstrate that aggregating splicing data through MSG can improve power in identifying gene-trait associations and help better understand the genetic risk of complex traits.Author summaryWhile genome-wide association studies (GWAS) have successfully mapped thousands of loci associated with complex traits, it remains difficult to identify which genes they regulate and in which biological contexts. This interpretation challenge has motivated the development of computational methods to prioritize causal genes at GWAS loci. Most available methods have focused on linking risk variants with differential gene expression. However, genetic control of splicing and expression are comparable in their complex trait risk, and few studies have focused on identifying causal genes using splicing information. To study splicing mediated effects, one important statistical challenge is the large multiple testing burden generated from multidimensional splicing events. In this study, we develop a new approach, MSG, to test the mediating role of splicing variation on complex traits. We integrate multidimensional splicing data using sparse canonocial correlation analysis and then combine evidence for splicing-trait associations across features using a joint test. We show this approach has higher power to identify causal genes using splicing data than current state-of-art methods designed to model multidimensional expression data. We illustrate the benefits of our approach through extensive simulations and applications to real data sets of 14 complex traits.


Author(s):  
Jianhua Wang ◽  
Dandan Huang ◽  
Yao Zhou ◽  
Hongcheng Yao ◽  
Huanhuan Liu ◽  
...  

Abstract Genome-wide association studies (GWASs) have revolutionized the field of complex trait genetics over the past decade, yet for most of the significant genotype-phenotype associations the true causal variants remain unknown. Identifying and interpreting how causal genetic variants confer disease susceptibility is still a big challenge. Herein we introduce a new database, CAUSALdb, to integrate the most comprehensive GWAS summary statistics to date and identify credible sets of potential causal variants using uniformly processed fine-mapping. The database has six major features: it (i) curates 3052 high-quality, fine-mappable GWAS summary statistics across five human super-populations and 2629 unique traits; (ii) estimates causal probabilities of all genetic variants in GWAS significant loci using three state-of-the-art fine-mapping tools; (iii) maps the reported traits to a powerful ontology MeSH, making it simple for users to browse studies on the trait tree; (iv) incorporates highly interactive Manhattan and LocusZoom-like plots to allow visualization of credible sets in a single web page more efficiently; (v) enables online comparison of causal relations on variant-, gene- and trait-levels among studies with different sample sizes or populations and (vi) offers comprehensive variant annotations by integrating massive base-wise and allele-specific functional annotations. CAUSALdb is freely available at http://mulinlab.org/causaldb.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Alvaro N. Barbeira ◽  
◽  
Rodrigo Bonazzola ◽  
Eric R. Gamazon ◽  
Yanyu Liang ◽  
...  

AbstractThe resources generated by the GTEx consortium offer unprecedented opportunities to advance our understanding of the biology of human diseases. Here, we present an in-depth examination of the phenotypic consequences of transcriptome regulation and a blueprint for the functional interpretation of genome-wide association study-discovered loci. Across a broad set of complex traits and diseases, we demonstrate widespread dose-dependent effects of RNA expression and splicing. We develop a data-driven framework to benchmark methods that prioritize causal genes and find no single approach outperforms the combination of multiple approaches. Using colocalization and association approaches that take into account the observed allelic heterogeneity of gene expression, we propose potential target genes for 47% (2519 out of 5385) of the GWAS loci examined.


2019 ◽  
Author(s):  
Jing Yang ◽  
Amanda McGovern ◽  
Paul Martin ◽  
Kate Duffus ◽  
Xiangyu Ge ◽  
...  

AbstractGenome-wide association studies have identified genetic variation contributing to complex disease risk. However, assigning causal genes and mechanisms has been more challenging because disease-associated variants are often found in distal regulatory regions with cell-type specific behaviours. Here, we collect ATAC-seq, Hi-C, Capture Hi-C and nuclear RNA-seq data in stimulated CD4+ T-cells over 24 hours, to identify functional enhancers regulating gene expression. We characterise changes in DNA interaction and activity dynamics that correlate with changes gene expression, and find that the strongest correlations are observed within 200 kb of promoters. Using rheumatoid arthritis as an example of T-cell mediated disease, we demonstrate interactions of expression quantitative trait loci with target genes, and confirm assigned genes or show complex interactions for 20% of disease associated loci, including FOXO1, which we confirm using CRISPR/Cas9.


Sign in / Sign up

Export Citation Format

Share Document