scholarly journals A Powerful Procedure for Pathway-based Meta-Analysis Using Summary Statistics Identifies 43 Pathways Associated with Type II Diabetes in European Populations

2016 ◽  
Author(s):  
Han Zhang ◽  
William Wheeler ◽  
Paula L Hyland ◽  
Yifan Yang ◽  
Jianxin Shi ◽  
...  

AbstractMeta-analysis of multiple genome-wide association studies (GWAS) has become an effective approach for detecting single nucleotide polymorphism (SNP) associations with complex traits. However, it is difficult to integrate the readily accessible SNP-level summary statistics from a meta-analysis into more powerful multi-marker testing procedures, which generally require individual-level genetic data. We developed a general procedure called Summary based Adaptive Rank Truncated Product (sARTP) for conducting gene and pathway meta-analysis that uses only SNP-level summary statistics in combination with genotype correlation estimated from a panel of individual-level genetic data. We demonstrated the validity and power advantage of sARTP through empirical and simulated data. We conducted a comprehensive pathway-based meta-analysis with sARTP on type 2 diabetes (T2D) by integrating SNP-level summary statistics from two large studies consisting of 19,809 T2D cases and 111,181 controls with European ancestry. Among 4,713 candidate pathways from which genes in neighborhoods of 170 GWAS established T2D loci were excluded, we detected 43 T2D globally significant pathways (with Bonferroni corrected p-values < 0.05), which included the insulin signaling pathway and T2D pathway defined by KEGG, as well as the pathways defined according to specific gene expression patterns on pancreatic adenocarcinoma, hepatocellular carcinoma, and bladder carcinoma. Using summary data from 8 eastern Asian T2D GWAS with 6,952 cases and 11,865 controls, we showed 7 out of the 43 pathways identified in European populations remained to be significant in eastern Asians at the false discovery rate of 0.1. We created an R package and a web-based tool for sARTP with the capability to analyze pathways with thousands of genes and tens of thousands of SNPs.Author SummaryAs GWAS continue to grow in sample size, it is evident that these studies need to be utilized more effectively for detecting individual susceptibility variants, and more importantly to provide insight into global genetic architecture of complex traits. Towards this goal, identifying association with respect to a collection of variants in biological pathways can be particularly insightful for understanding how networks of genes might be affecting pathophysiology of diseases. Here we present a new pathway analysis procedure that can be conducted using summary-level association statistics, which have become the main vehicle for performing meta-analysis of individual genetic variants across studies in large consortia. Through simulation studies we showed the proposed method was more powerful than the existing state-of-art method. We carried out a comprehensive pathway analysis of 4,713 candidate pathways on their association with T2D using two large studies with European ancestry and identified 43 T2D-associated pathways. Further examinations of those 43 pathways in 8 Asian studies showed that some pathways were trans-ethnically associated with T2D. This analysis clearly highlights novel T2D-associated pathways beyond what has been known from single-variant association analysis reported from largest GWAS to date.

2019 ◽  
Author(s):  
Cassandra N Spracklen ◽  
Momoko Horikoshi ◽  
Young Jin Kim ◽  
Kuang Lin ◽  
Fiona Bragg ◽  
...  

SUMMARYMeta-analyses of genome-wide association studies (GWAS) have identified >240 loci associated with type 2 diabetes (T2D), however most loci have been identified in analyses of European-ancestry individuals. To examine T2D risk in East Asian individuals, we meta-analyzed GWAS data in 77,418 cases and 356,122 controls. In the main analysis, we identified 298 distinct association signals at 178 loci, and across T2D association models with and without consideration of body mass index and sex, we identified 56 loci newly implicated in T2D predisposition. Common variants associated with T2D in both East Asian and European populations exhibited strongly correlated effect sizes. New associations include signals in/near GDAP1, PTF1A, SIX3, ALDH2, a microRNA cluster, and genes that affect muscle and adipose differentiation. At another locus, eQTLs at two overlapping T2D signals act through two genes, NKX6-3 and ANK1, in different tissues. Association studies in diverse populations identify additional loci and elucidate disease genes, biology, and pathways.Type 2 diabetes (T2D) is a common metabolic disease primarily caused by insufficient insulin production and/or secretion by the pancreatic β cells and insulin resistance in peripheral tissues1. Most genetic loci associated with T2D have been identified in populations of European (EUR) ancestry, including a recent meta-analysis of genome-wide association studies (GWAS) of nearly 900,000 individuals of European ancestry that identified >240 loci influencing the risk of T2D2. Differences in allele frequency between ancestries affect the power to detect associations within a population, particularly among variants rare or monomorphic in one population but more frequent in another3,4. Although smaller than studies in European populations, a recent T2D meta-analysis in almost 200,000 Japanese individuals identified 28 additional loci4. The relative contributions of different pathways to the pathophysiology of T2D may also differ between ancestry groups. For example, in East Asian (EAS) populations, T2D prevalence is greater than in European populations among people of similar body mass index (BMI) or waist circumference5. We performed the largest meta-analysis of East Asian individuals to identify new genetic associations and provide insight into T2D pathogenesis.


2019 ◽  
Vol 10 (1) ◽  
Author(s):  
Luke R. Lloyd-Jones ◽  
Jian Zeng ◽  
Julia Sidorenko ◽  
Loïc Yengo ◽  
Gerhard Moser ◽  
...  

Abstract Accurate prediction of an individual’s phenotype from their DNA sequence is one of the great promises of genomics and precision medicine. We extend a powerful individual-level data Bayesian multiple regression model (BayesR) to one that utilises summary statistics from genome-wide association studies (GWAS), SBayesR. In simulation and cross-validation using 12 real traits and 1.1 million variants on 350,000 individuals from the UK Biobank, SBayesR improves prediction accuracy relative to commonly used state-of-the-art summary statistics methods at a fraction of the computational resources. Furthermore, using summary statistics for variants from the largest GWAS meta-analysis (n ≈ 700, 000) on height and BMI, we show that on average across traits and two independent data sets that SBayesR improves prediction R2 by 5.2% relative to LDpred and by 26.5% relative to clumping and p value thresholding.


2021 ◽  
Vol 118 (25) ◽  
pp. e2023184118
Author(s):  
Yuchang Wu ◽  
Xiaoyuan Zhong ◽  
Yunong Lin ◽  
Zijie Zhao ◽  
Jiawen Chen ◽  
...  

Marginal effect estimates in genome-wide association studies (GWAS) are mixtures of direct and indirect genetic effects. Existing methods to dissect these effects require family-based, individual-level genetic, and phenotypic data with large samples, which is difficult to obtain in practice. Here, we propose a statistical framework to estimate direct and indirect genetic effects using summary statistics from GWAS conducted on own and offspring phenotypes. Applied to birth weight, our method showed nearly identical results with those obtained using individual-level data. We also decomposed direct and indirect genetic effects of educational attainment (EA), which showed distinct patterns of genetic correlations with 45 complex traits. The known genetic correlations between EA and higher height, lower body mass index, less-active smoking behavior, and better health outcomes were mostly explained by the indirect genetic component of EA. In contrast, the consistently identified genetic correlation of autism spectrum disorder (ASD) with higher EA resides in the direct genetic component. A polygenic transmission disequilibrium test showed a significant overtransmission of the direct component of EA from healthy parents to ASD probands. Taken together, we demonstrate that traditional GWAS approaches, in conjunction with offspring phenotypic data collection in existing cohorts, could greatly benefit studies on genetic nurture and shed important light on the interpretation of genetic associations for human complex traits.


2015 ◽  
Author(s):  
Anna Cichonska ◽  
Juho Rousu ◽  
Pekka Marttinen ◽  
Antti J Kangas ◽  
Pasi Soininen ◽  
...  

A dominant approach to genetic association studies is to perform univariate tests between genotype-phenotype pairs. However, analysing related traits together increases statistical power, and certain complex associations become detectable only when several variants are tested jointly. Currently, modest sample sizes of individual cohorts and restricted availability of individual-level genotype-phenotype data across the cohorts limit conducting multivariate tests. We introduce metaCCA, a computational framework for summary statistics-based analysis of a single or multiple studies that allows multivariate representation of both genotype and phenotype. It extends the statistical technique of canonical correlation analysis to the setting where original individual-level records are not available, and employs a covariance shrinkage algorithm to achieve robustness. Multivariate meta-analysis of two Finnish studies of nuclear magnetic resonance metabolomics by metaCCA, using standard univariate output from the program SNPTEST, shows an excellent agreement with the pooled individual-level analysis of original data. Motivated by strong multivariate signals in the lipid genes tested, we envision that multivariate association testing using metaCCA has a great potential to provide novel insights from already published summary statistics from high-throughput phenotyping technologies.


2021 ◽  
Author(s):  
Mark J. O’Connor ◽  
Philip Schroeder ◽  
Alicia Huerta-Chagoya ◽  
Paula Cortés-Sánchez ◽  
Silvía Bonàs-Guarch ◽  
...  

Most genome-wide association studies (GWAS) of complex traits are performed using models with additive allelic effects. Hundreds of loci associated with type 2 diabetes have been identified using this approach. Additive models, however, can miss loci with recessive effects, thereby leaving potentially important genes undiscovered. We conducted the largest GWAS meta-analysis using a recessive model for type 2 diabetes. Our discovery sample included 33,139 cases and 279,507 controls from seven European-ancestry cohorts including the UK Biobank. We identified 51 loci associated with type 2 diabetes, including five variants undetected by prior additive analyses. Two of the five had minor allele frequency less than 5% and were each associated with more than doubled risk in homozygous carriers. Using two additional cohorts, FinnGen and a Danish cohort, we replicated three of the variants, including one of the low-frequency variants, rs115018790, which had an odds ratio in homozygous carriers of 2.56 (95% CI 2.05-3.19, <i>P</i>=1´10<sup>-16</sup>) and a stronger effect in men than in women (interaction <i>P</i>=7´10<sup>-7</sup>). The signal was associated with multiple diabetes-related traits, with homozygous carriers showing a 10% decrease in LDL and a 20% increase in triglycerides, and colocalization analysis linked this signal to reduced expression of the nearby <i>PELO</i> gene. These results demonstrate that recessive models, when compared to GWAS using the additive approach, can identify novel loci, including large-effect variants with pathophysiological consequences relevant to type 2 diabetes.


Author(s):  
Yuchang Wu ◽  
Xiaoyuan Zhong ◽  
Yunong Lin ◽  
Zijie Zhao ◽  
Jiawen Chen ◽  
...  

AbstractMarginal effect estimates in genome-wide association studies (GWAS) are mixtures of direct and indirect genetic effects. Existing methods to dissect these effects require family-based, individual-level genetic and phenotypic data with large samples, which is difficult to obtain in practice. Here, we propose a novel statistical framework to estimate direct and indirect genetic effects using summary statistics from GWAS conducted on own and offspring phenotypes. Applied to birth weight, our method showed nearly identical results with those obtained using individual-level data. We also decomposed direct and indirect genetic effects of educational attainment (EA), which showed distinct patterns of genetic correlations with 45 complex traits. The known genetic correlations between EA and higher height, lower BMI, less active smoking behavior, and better health outcomes were mostly explained by the indirect genetic component of EA. In contrast, the consistently identified genetic correlation of autism spectrum disorder (ASD) with higher EA resides in the direct genetic component. Polygenic transmission disequilibrium test showed a significant over-transmission of the direct component of EA from healthy parents to ASD probands. Taken together, we demonstrate that traditional GWAS approaches, in conjunction with offspring phenotypic data collection in existing cohorts, could greatly benefit studies on genetic nurture and shed important light on the interpretation of genetic associations for human complex traits.


2021 ◽  
Author(s):  
Mark J. O’Connor ◽  
Philip Schroeder ◽  
Alicia Huerta-Chagoya ◽  
Paula Cortés-Sánchez ◽  
Silvía Bonàs-Guarch ◽  
...  

Most genome-wide association studies (GWAS) of complex traits are performed using models with additive allelic effects. Hundreds of loci associated with type 2 diabetes have been identified using this approach. Additive models, however, can miss loci with recessive effects, thereby leaving potentially important genes undiscovered. We conducted the largest GWAS meta-analysis using a recessive model for type 2 diabetes. Our discovery sample included 33,139 cases and 279,507 controls from seven European-ancestry cohorts including the UK Biobank. We identified 51 loci associated with type 2 diabetes, including five variants undetected by prior additive analyses. Two of the five had minor allele frequency less than 5% and were each associated with more than doubled risk in homozygous carriers. Using two additional cohorts, FinnGen and a Danish cohort, we replicated three of the variants, including one of the low-frequency variants, rs115018790, which had an odds ratio in homozygous carriers of 2.56 (95% CI 2.05-3.19, <i>P</i>=1´10<sup>-16</sup>) and a stronger effect in men than in women (interaction <i>P</i>=7´10<sup>-7</sup>). The signal was associated with multiple diabetes-related traits, with homozygous carriers showing a 10% decrease in LDL and a 20% increase in triglycerides, and colocalization analysis linked this signal to reduced expression of the nearby <i>PELO</i> gene. These results demonstrate that recessive models, when compared to GWAS using the additive approach, can identify novel loci, including large-effect variants with pathophysiological consequences relevant to type 2 diabetes.


2020 ◽  
Author(s):  
Clara Albiñana ◽  
Jakob Grove ◽  
John J. McGrath ◽  
Esben Agerbo ◽  
Naomi R. Wray ◽  
...  

AbstractThe accuracy of polygenic risk scores (PRSs) to predict complex diseases increases with the training sample size. PRSs are generally derived based on summary statistics from large meta-analyses of multiple genome-wide association studies (GWAS). However, it is now common for researchers to have access to large individual-level data as well, such as the UK biobank data. To the best of our knowledge, it has not yet been explored how to best combine both types of data (summary statistics and individual-level data) to optimize polygenic prediction. The most widely used approach to combine data is the meta-analysis of GWAS summary statistics (Meta-GWAS), but we show that it does not always provide the most accurate PRS. Through simulations and using twelve real case-control and quantitative traits from both iPSYCH and UK Biobank along with external GWAS summary statistics, we compare Meta-GWAS with two alternative data-combining approaches, stacked clumping and thresholding (SCT) and Meta-PRS. We find that, when large individual-level data is available, the linear combination of PRSs (Meta-PRS) is both a simple alternative to Meta-GWAS and often more accurate.


2018 ◽  
Author(s):  
Loic Yengo ◽  
Julia Sidorenko ◽  
Kathryn E. Kemper ◽  
Zhili Zheng ◽  
Andrew R. Wood ◽  
...  

Genome-wide association studies (GWAS) stand as powerful experimental designs for identifying DNA variants associated with complex traits and diseases. In the past decade, both the number of such studies and their sample sizes have increased dramatically. Recent GWAS of height and body mass index (BMI) in ∼250,000 European participants have led to the discovery of ∼700 and ∼100 nearly independent SNPs associated with these traits, respectively. Here we combine summary statistics from those two studies with GWAS of height and BMI performed in ∼450,000 UK Biobank participants of European ancestry. Overall, our combined GWAS meta-analysis reaches N∼700,000 individuals and substantially increases the number of GWAS signals associated with these traits. We identified 3,290 and 716 near-independent SNPs associated with height and BMI, respectively (at a revised genome-wide significance threshold of p<1 × 10−8), including 1,185 height-associated SNPs and 554 BMI-associated SNPs located within loci not previously identified by these two GWAS. The genome-wide significant SNPs explain ∼24.6% of the variance of height and ∼5% of the variance of BMI in an independent sample from the Health and Retirement Study (HRS). Correlations between polygenic scores based upon these SNPs with actual height and BMI in HRS participants were 0.44 and 0.20, respectively. From analyses of integrating GWAS and eQTL data by Summary-data based Mendelian Randomization (SMR), we identified an enrichment of eQTLs amongst lead height and BMI signals, prioritisting 684 and 134 genes, respectively. Our study demonstrates that, as previously predicted, increasing GWAS sample sizes continues to deliver, by discovery of new loci, increasing prediction accuracy and providing additional data to achieve deeper insight into complex trait biology. All summary statistics are made available for follow up studies.


Sign in / Sign up

Export Citation Format

Share Document