RiVIERA-beta: Joint Bayesian inference of risk variants and tissue-specific epigenomic enrichments across multiple complex human diseases

Mapping Intimacies ◽

10.1101/059329 ◽

2016 ◽

Cited By ~ 1

Author(s):

Yue Li ◽

Manolis Kellis

Keyword(s):

Autoimmune Diseases ◽

Association Studies ◽

Joint Modeling ◽

Genome Wide Association Studies ◽

Summary Statistics ◽

Multiple Traits ◽

Tissue Specific ◽

Joint Inference ◽

Hypersensitive Sites ◽

Causal Variants

Genome wide association studies (GWAS) provide a powerful approach for uncovering disease-associated variants in human, but fine-mapping the causal variants remains a challenge. This is partly remedied by prioritization of disease-associated variants that overlap GWAS-enriched epigenomic annotations. Here, we introduce a new Bayesian model RiVIERA-beta (Risk Variant Inference using Epigenomic Reference Annotations) for inference of driver variants by modelling summary statistics p-values in Beta density function across multiple traits using hundreds of epigenomic annotations. In simulation, RiVIERA-beta promising power in detecting causal variants and causal annotations, the multi-trait joint inference further improved the detection power. We applied RiVIERA-beta to model the existing GWAS summary statistics of 9 autoimmune diseases and Schizophrenia by jointly harnessing the potential causal enrichments among 848 tissue-specific epigenomics annotations from ENCODE/Roadmap consortium covering 127 cell/tissue types and 8 major epigenomic marks. RiVIERA-beta identified meaningful tissue-specific enrichments for enhancer regions defined by H3K4me1 and H3K27ac for Blood T-Cell specifically in the 9 autoimmune diseases and Brain-specific enhancer activities exclusively in Schizophrenia. Moreover, the variants from the 95% credible sets exhibited high conservation and enrichments for GTEx whole-blood eQTLs located within transcription-factor-binding-sites and DNA-hypersensitive-sites. Furthermore, joint modeling the nine immune traits by simultaneously inferring and exploiting the underlying epigenomic correlation between traits further improved the functional enrichments compared to single-trait models.

Download Full-text

CAUSALdb: a database for disease/trait causal variants identified using summary statistics of genome-wide association studies

Nucleic Acids Research ◽

10.1093/nar/gkz1026 ◽

2019 ◽

Cited By ~ 2

Author(s):

Jianhua Wang ◽

Dandan Huang ◽

Yao Zhou ◽

Hongcheng Yao ◽

Huanhuan Liu ◽

...

Keyword(s):

Fine Mapping ◽

Genetic Variants ◽

Association Studies ◽

Complex Trait ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Summary Statistics ◽

Genome Wide ◽

Credible Sets ◽

Causal Variants

Abstract Genome-wide association studies (GWASs) have revolutionized the field of complex trait genetics over the past decade, yet for most of the significant genotype-phenotype associations the true causal variants remain unknown. Identifying and interpreting how causal genetic variants confer disease susceptibility is still a big challenge. Herein we introduce a new database, CAUSALdb, to integrate the most comprehensive GWAS summary statistics to date and identify credible sets of potential causal variants using uniformly processed fine-mapping. The database has six major features: it (i) curates 3052 high-quality, fine-mappable GWAS summary statistics across five human super-populations and 2629 unique traits; (ii) estimates causal probabilities of all genetic variants in GWAS significant loci using three state-of-the-art fine-mapping tools; (iii) maps the reported traits to a powerful ontology MeSH, making it simple for users to browse studies on the trait tree; (iv) incorporates highly interactive Manhattan and LocusZoom-like plots to allow visualization of credible sets in a single web page more efficiently; (v) enables online comparison of causal relations on variant-, gene- and trait-levels among studies with different sample sizes or populations and (vi) offers comprehensive variant annotations by integrating massive base-wise and allele-specific functional annotations. CAUSALdb is freely available at http://mulinlab.org/causaldb.

Download Full-text

Beyond SNP Heritability: Polygenicity and Discoverability of Phenotypes Estimated with a Univariate Gaussian Mixture Model

10.1101/133132 ◽

2017 ◽

Cited By ~ 8

Author(s):

Dominic Holland ◽

Oleksandr Frei ◽

Rahul Desikan ◽

Chun-Chieh Fan ◽

Alexey A. Shadrin ◽

...

Keyword(s):

Association Studies ◽

Causal Snps ◽

Reference Panel ◽

Causal Effects ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Summary Statistics ◽

Common Variants ◽

Genome Wide ◽

Causal Variants

AbstractEstimating the polygenicity (proportion of causally associated single nucleotide polymorphisms (SNPs)) and discoverability (effect size variance) of causal SNPs for human traits is currently of considerable interest. SNP-heritability is proportional to the product of these quantities. We present a basic model, using detailed linkage disequilibrium structure from an extensive reference panel, to estimate these quantities from genome-wide association studies (GWAS) summary statistics. We apply the model to diverse phenotypes and validate the implementation with simulations. We find model polygenicities ranging from ≃ 2 × 10−5to ≃ 4 × 10−3, with discoverabilities similarly ranging over two orders of magnitude. A power analysis allows us to estimate the proportions of phenotypic variance explained additively by causal SNPs reaching genome-wide significance at current sample sizes, and map out sample sizes required to explain larger portions of additive SNP heritability. The model also allows for estimating residual inflation (or deflation from over-correcting of z-scores), and assessing compatibility of replication and discovery GWAS summary statistics.Author SummaryThere are ~10 million common variants in the genome of humans with European ancestry. For any particular phenotype a number of these variants will have some causal effect. It is of great interest to be able to quantify the number of these causal variants and the strength of their effect on the phenotype.Genome wide association studies (GWAS) produce very noisy summary statistics for the association between subsets of common variants and phenotypes. For any phenotype, these statistics collectively are difficult to interpret, but buried within them is the true landscape of causal effects. In this work, we posit a probability distribution for the causal effects, and assess its validity using simulations. Using a detailed reference panel of ~11 million common variants – among which only a small fraction are likely to be causal, but allowing for non-causal variants to show an association with the phenotype due to correlation with causal variants – we implement an exact procedure for estimating the number of causal variants and their mean strength of association with the phenotype. We find that, across different phenotypes, both these quantities – whose product allows for lower bound estimates of heritability – vary by orders of magnitude.

Download Full-text

An iterative approach to detect pleiotropy and perform Mendelian Randomization analysis using GWAS summary statistics

Bioinformatics ◽

10.1093/bioinformatics/btaa985 ◽

2020 ◽

Author(s):

Xiaofeng Zhu ◽

Xiaoyin Li ◽

Rong Xu ◽

Tao Wang

Keyword(s):

Complex Traits ◽

Mendelian Randomization ◽

Causal Effect ◽

Association Studies ◽

Real Data ◽

Supplementary Information ◽

Genome Wide Association Studies ◽

Summary Statistics ◽

Causal Relationships ◽

Multiple Traits

Abstract Motivation The overall association evidence of a genetic variant with multiple traits can be evaluated by cross-phenotype association analysis using summary statistics from genome-wide association studies. Further dissecting the association pathways from a variant to multiple traits is important to understand the biological causal relationships among complex traits. Results Here, we introduce a flexible and computationally efficient Iterative Mendelian Randomization and Pleiotropy (IMRP) approach to simultaneously search for horizontal pleiotropic variants and estimate causal effect. Extensive simulations and real data applications suggest that IMRP has similar or better performance than existing Mendelian Randomization methods for both causal effect estimation and pleiotropic variant detection. The developed pleiotropy test is further extended to detect colocalization for multiple variants at a locus. IMRP will greatly facilitate our understanding of causal relationships underlying complex traits, in particular, when a large number of genetic instrumental variables are used for evaluating multiple traits. Availability and implementation The software IMRP is available at https://github.com/XiaofengZhuCase/IMRP. The simulation codes can be downloaded at http://hal.case.edu/∼xxz10/zhu-web/ under the link: MR Simulations software. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

nMAGMA: a network enhanced method for inferring risk genes from GWAS summary statistics and its application to schizophrenia

10.1101/2020.08.15.250282 ◽

2020 ◽

Author(s):

Anyi Yang ◽

Jingqi Chen ◽

Xing-Ming Zhao

Keyword(s):

Gene Networks ◽

Association Studies ◽

Specific Gene ◽

Genome Wide Association Studies ◽

Summary Statistics ◽

Nucleotide Polymorphisms ◽

Risk Genes ◽

Tissue Specific ◽

Genomic Annotation ◽

Genome Wide

AbstractMotivationAnnotating genetic variants from summary statistics of genome-wide association studies (GWAS) is crucial for predicting risk genes of various disorders. The multi-marker analysis of genomic annotation (MAGMA) is one of the most popular tools for this purpose, where MAGMA aggregates signals of single nucleotide polymorphisms (SNPs) to their nearby genes. However, SNPs may also affect genes in a distance, thus missed by MAGMA. Although different upgrades of MAGMA have been proposed to extend gene-wise variant annotations with more information (e.g. Hi-C or eQTL), the regulatory relationships among genes and the tissue-specificity of signals have not been taken into account.ResultsWe propose a new approach, namely network-enhanced MAGMA (nMAGMA), for gene-wise annotation of variants from GWAS summary statistics. Compared with MAGMA and H-MAGMA, nMAGMA significantly extends the lists of genes that can be annotated to SNPs by integrating local signals, long-range regulation signals, and tissue-specific gene networks. When applied to schizophrenia, nMAGMA is able to detect more risk genes (217% more than MAGMA and 57% more than H-MAGMA) that are reasonably involved in schizophrenia compared to MAGMA and H-MAGMA. Some disease-related functions (e.g. the ATPase pathway in Cortex) tissues are also uncovered in nMAGMA but not in MAGMA or H-MAGMA. Moreover, nMAGMA provides tissue-specific risk signals, which are useful for understanding disorders with multi-tissue origins.

Download Full-text

Leveraging allelic heterogeneity to increase power of association testing

10.1101/498360 ◽

2018 ◽

Author(s):

Farhad I Hormozdiari ◽

Junghyun Jung ◽

Eleazar Eskin ◽

Jong Wha J. Joo

Keyword(s):

Statistical Power ◽

Association Studies ◽

Simulated Data ◽

Association Test ◽

Type I ◽

Genome Wide Association Studies ◽

Summary Statistics ◽

Causal Status ◽

Causal Variants ◽

Multiple Variants

The standard genome-wide association studies (GWAS) detects an association between a single variant and a phenotype of interest. Recently, several studies reported that at many risk loci, there may exist multiple causal variants. For a locus with multiple causal variants with small effect sizes, the standard association test is underpowered to detect the associations. Alternatively, an approach considering effects of multiple variants simultaneously may increase statistical power by leveraging effects of multiple causal variants. In this paper, we propose a new statistical method, Model-based Association test Reflecting causal Status (MARS), that tries to find an association between variants in risk loci and a phenotype, considering the causal status of the variants. One of the main advantages of MARS is that it only requires the existing summary statistics to detect associated risk loci. Thus, MARS is applicable to any association study with summary statistics, even though individual level data is not available for the study. Utilizing extensive simulated data sets, we show that MARS increases the power of detecting true associated risk loci compared to previous approaches that consider multiple variants, while robustly controls the type I error. Applied to data of 44 tissues provided by the Genotype-Tissue Expression (GTEx) consortium, we show that MARS identifies more eGenes compared to previous approaches in most of the tissues; e.g. MARS identified 16% more eGenes than the ones reported by the GTEx consortium. Moreover, applied to Northern Finland Birth Cohort (NFBC) data, we demonstrate that MARS effectively identifies association loci with improved power (56% of more loci found by MARS) inGWAS studies compared to the standard association test.

Download Full-text

nMAGMA: a network-enhanced method for inferring risk genes from GWAS summary statistics and its application to schizophrenia

Briefings in Bioinformatics ◽

10.1093/bib/bbaa298 ◽

2020 ◽

Author(s):

Anyi Yang ◽

Jingqi Chen ◽

Xing-Ming Zhao

Keyword(s):

Gene Networks ◽

Association Studies ◽

Specific Gene ◽

Genome Wide Association Studies ◽

Summary Statistics ◽

Nucleotide Polymorphisms ◽

Risk Genes ◽

Tissue Specific ◽

Genomic Annotation ◽

Dna Elements

Abstract Motivation: Annotating genetic variants from summary statistics of genome-wide association studies (GWAS) is crucial for predicting risk genes of various disorders. The multimarker analysis of genomic annotation (MAGMA) is one of the most popular tools for this purpose, where MAGMA aggregates signals of single nucleotide polymorphisms (SNPs) to their nearby genes. In biology, SNPs may also affect genes that are far away in the genome, thus missed by MAGMA. Although different upgrades of MAGMA have been proposed to extend gene-wise variant annotations with more information (e.g. Hi-C or eQTL), the regulatory relationships among genes and the tissue specificity of signals have not been taken into account. Results: We propose a new approach, namely network-enhanced MAGMA (nMAGMA), for gene-wise annotation of variants from GWAS summary statistics. Compared with MAGMA and H-MAGMA, nMAGMA significantly extends the lists of genes that can be annotated to SNPs by integrating local signals, long-range regulation signals (i.e. interactions between distal DNA elements), and tissue-specific gene networks. When applied to schizophrenia (SCZ), nMAGMA is able to detect more risk genes (217% more than MAGMA and 57% more than H-MAGMA) that are involved in SCZ compared with MAGMA and H-MAGMA, and more of nMAGMA results can be validated with known SCZ risk genes. Some disease-related functions (e.g. the ATPase pathway in Cortex) are also uncovered in nMAGMA but not in MAGMA or H-MAGMA. Moreover, nMAGMA provides tissue-specific risk signals, which are useful for understanding disorders with multitissue origins.

Download Full-text

Leveraging Gene Co-expression Patterns to Infer Trait-Relevant Tissues in Genome-wide Association Studies

10.1101/705129 ◽

2019 ◽

Cited By ~ 2

Author(s):

Lulu Shang ◽

Jennifer A. Smith ◽

Xiang Zhou

Keyword(s):

Autoimmune Diseases ◽

Single Cell ◽

Neurological Disorders ◽

Association Studies ◽

Cell Types ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Tissue Specific ◽

Genome Wide ◽

Rnaseq Data

AbstractGenome-wide association studies (GWASs) have identified many SNPs associated with various common diseases. Understanding the biological functions of these identified SNP associations requires identifying disease/trait relevant tissues or cell types. Here, we develop a network method, CoCoNet, to facilitate the identification of trait-relevant tissues or cell types. Different from existing approaches, CoCoNet incorporates tissue-specific gene co-expression networks constructed from either bulk or single cell RNA sequencing (RNAseq) studies with GWAS data for trait-tissue inference. In particular, CoCoNet relies on a covariance regression network model to express gene-level effect sizes for the given GWAS trait as a function of the tissue-specific co-expression adjacency matrix. With a composite likelihood-based inference algorithm, CoCoNet is scalable to tens of thousands of genes. We validate the performance of CoCoNet through extensive simulations. We apply CoCoNet for an in-depth analysis of four neurological disorders and four autoimmune diseases, where we integrate the corresponding GWASs with bulk RNAseq data from 38 tissues and single cell RNAseq data from 10 cell types. In the real data applications, we show how CoCoNet can help identify specific glial cell types relevant for neurological disorders and identify disease-targeted colon tissues as relevant for autoimmune diseases. Our results also provide empirical evidence supporting one hypothesis of the omnigenic model: that trait-relevant gene co-expression networks underlie disease etiology.

Download Full-text

Penalized partial least squares for pleiotropy

BMC Bioinformatics ◽

10.1186/s12859-021-03968-1 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Camilo Broc ◽

Therese Truong ◽

Benoit Liquet

Keyword(s):

Least Squares ◽

Partial Least Squares ◽

Association Studies ◽

A Priori ◽

Simulated Data ◽

Real Data ◽

Genome Wide Association Studies ◽

Genetic Associations ◽

Multiple Traits ◽

Application Fields

Abstract Background The increasing number of genome-wide association studies (GWAS) has revealed several loci that are associated to multiple distinct phenotypes, suggesting the existence of pleiotropic effects. Highlighting these cross-phenotype genetic associations could help to identify and understand common biological mechanisms underlying some diseases. Common approaches test the association between genetic variants and multiple traits at the SNP level. In this paper, we propose a novel gene- and a pathway-level approach in the case where several independent GWAS on independent traits are available. The method is based on a generalization of the sparse group Partial Least Squares (sgPLS) to take into account groups of variables, and a Lasso penalization that links all independent data sets. This method, called joint-sgPLS, is able to convincingly detect signal at the variable level and at the group level. Results Our method has the advantage to propose a global readable model while coping with the architecture of data. It can outperform traditional methods and provides a wider insight in terms of a priori information. We compared the performance of the proposed method to other benchmark methods on simulated data and gave an example of application on real data with the aim to highlight common susceptibility variants to breast and thyroid cancers. Conclusion The joint-sgPLS shows interesting properties for detecting a signal. As an extension of the PLS, the method is suited for data with a large number of variables. The choice of Lasso penalization copes with architectures of groups of variables and observations sets. Furthermore, although the method has been applied to a genetic study, its formulation is adapted to any data with high number of variables and an exposed a priori architecture in other application fields.

Download Full-text

Editing GWAS: experimental approaches to dissect and exploit disease-associated genetic variation

Genome Medicine ◽

10.1186/s13073-021-00857-3 ◽

2021 ◽

Vol 13 (1) ◽

Author(s):

Shuquan Rao ◽

Yao Yao ◽

Daniel E. Bauer

Keyword(s):

Genome Editing ◽

Genetic Variants ◽

Association Studies ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Functional Studies ◽

Functional Genetics ◽

Genome Wide ◽

Causal Variants ◽

Experimental Approaches

AbstractGenome-wide association studies (GWAS) have uncovered thousands of genetic variants that influence risk for human diseases and traits. Yet understanding the mechanisms by which these genetic variants, mainly noncoding, have an impact on associated diseases and traits remains a significant hurdle. In this review, we discuss emerging experimental approaches that are being applied for functional studies of causal variants and translational advances from GWAS findings to disease prevention and treatment. We highlight the use of genome editing technologies in GWAS functional studies to modify genomic sequences, with proof-of-principle examples. We discuss the challenges in interrogating causal variants, points for consideration in experimental design and interpretation of GWAS locus mechanisms, and the potential for novel therapeutic opportunities. With the accumulation of knowledge of functional genetics, therapeutic genome editing based on GWAS discoveries will become increasingly feasible.

Download Full-text

Association analysis of juvenile idiopathic arthritis genetic susceptibility factors in Estonian patients

Clinical Rheumatology ◽

10.1007/s10067-021-05756-x ◽

2021 ◽

Author(s):

Tiit Nikopensius ◽

Priit Niibo ◽

Toomas Haller ◽

Triin Jagomägi ◽

Ülle Voog-Oras ◽

...

Keyword(s):

Rheumatoid Arthritis ◽

Juvenile Idiopathic Arthritis ◽

Autoimmune Diseases ◽

Genetic Risk ◽

Association Studies ◽

Control Sample ◽

Case Control ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Genome Wide

Abstract Background Juvenile idiopathic arthritis (JIA) is the most common chronic rheumatic condition of childhood. Genetic association studies have revealed several JIA susceptibility loci with the strongest effect size observed in the human leukocyte antigen (HLA) region. Genome-wide association studies have augmented the number of JIA-associated loci, particularly for non-HLA genes. The aim of this study was to identify new associations at non-HLA loci predisposing to the risk of JIA development in Estonian patients. Methods We performed genome-wide association analyses in an entire JIA case–control sample (All-JIA) and in a case–control sample for oligoarticular JIA, the most prevalent JIA subtype. The entire cohort was genotyped using the Illumina HumanOmniExpress BeadChip arrays. After imputation, 16,583,468 variants were analyzed in 263 cases and 6956 controls. Results We demonstrated nominal evidence of association for 12 novel non-HLA loci not previously implicated in JIA predisposition. We replicated known JIA associations in CLEC16A and VCTN1 regions in the oligoarticular JIA sample. The strongest associations in the All-JIA analysis were identified at PRKG1 (P = 2,54 × 10−6), LTBP1 (P = 9,45 × 10−6), and ELMO1 (P = 1,05 × 10−5). In the oligoarticular JIA analysis, the strongest associations were identified at NFIA (P = 5,05 × 10−6), LTBP1 (P = 9,95 × 10−6), MX1 (P = 1,65 × 10−5), and CD200R1 (P = 2,59 × 10−5). Conclusion This study increases the number of known JIA risk loci and provides additional evidence for the existence of overlapping genetic risk loci between JIA and other autoimmune diseases, particularly rheumatoid arthritis. The reported loci are involved in molecular pathways of immunological relevance and likely represent genomic regions that confer susceptibility to JIA in Estonian patients. Key Points• Juvenile idiopathic arthritis (JIA) is the most common childhood rheumatic disease with heterogeneous presentation and genetic predisposition.• Present genome-wide association study for Estonian JIA patients is first of its kind in Northern and Northeastern Europe.• The results of the present study increase the knowledge about JIA risk loci replicating some previously described associations, so adding weight to their relevance and describing novel loci.• The study provides additional evidence for the existence of overlapping genetic risk loci between JIA and other autoimmune diseases, particularly rheumatoid arthritis.

Download Full-text