scholarly journals Estimating Heritability and Genetic Correlation in Case Control Studies Directly and with Summary Statistics

2018 ◽  
Author(s):  
Omer Weissbrod ◽  
Jonathan Flint ◽  
Saharon Rosset

AbstractMethods that estimate heritability and genetic correlations from genome-wide association studies have proven to be powerful tools for investigating the genetic architecture of common diseases and exposing unexpected relationships between disorders. Many relevant studies employ a case-control design, yet most methods are primarily geared towards analyzing quantitative traits. Here we investigate the validity of three common methods for estimating genetic heritability and genetic correlation. We find that the Phenotype-Correlation-Genotype-Correlation (PCGC) approach is the only method that can estimate both quantities accurately in the presence of important non-genetic risk factors, such as age and sex. We extend PCGC to work with summary statistics that take the case-control sampling into account, and demonstrate that our new method, PCGC-s, accurately estimates both heritability and genetic correlations and can be applied to large data sets without requiring individual-level genotypic or phenotypic information. Finally, we use PCGC-S to estimate the genetic correlation between schizophrenia and bipolar disorder, and demonstrate that previous estimates are biased due to incorrect handling of sex as a strong risk factor. PCGC-s is available at https://github.com/omerwe/PCGCs.

Author(s):  
Yiliang Zhang ◽  
Youshu Cheng ◽  
Wei Jiang ◽  
Yixuan Ye ◽  
Qiongshi Lu ◽  
...  

AbstractGenetic correlation is the correlation of additive genetic effects on two phenotypes. It is an informative metric to quantify the overall genetic similarity between complex traits, which provides insights into their polygenic genetic architecture. Several methods have been proposed to estimate genetic correlations based on data collected from genome-wide association studies (GWAS). Due to the easy access of GWAS summary statistics and computational efficiency, methods only requiring GWAS summary statistics as input have become more popular than methods utilizing individual-level genotype data. Here, we present a benchmark study for different summary-statistics-based genetic correlation estimation methods through simulation and real data applications. We focus on two major technical challenges in estimating genetic correlation: marker dependency caused by linkage disequilibrium (LD) and sample overlap between different studies. To assess the performance of different methods in the presence of these two challenges, we first conducted comprehensive simulations with diverse LD patterns and sample overlaps. Then we applied these methods to real GWAS summary statistics for a wide spectrum of complex traits. Based on these experiments, we conclude that methods relying on accurate LD estimation are less robust in real data applications compared to other methods due to the imprecision of LD obtained from reference panels. Our findings offer a guidance on how to appropriately choose the method for genetic correlation estimation in post-GWAS analysis in interpretation.


2019 ◽  
Vol 10 (1) ◽  
Author(s):  
Luke R. Lloyd-Jones ◽  
Jian Zeng ◽  
Julia Sidorenko ◽  
Loïc Yengo ◽  
Gerhard Moser ◽  
...  

Abstract Accurate prediction of an individual’s phenotype from their DNA sequence is one of the great promises of genomics and precision medicine. We extend a powerful individual-level data Bayesian multiple regression model (BayesR) to one that utilises summary statistics from genome-wide association studies (GWAS), SBayesR. In simulation and cross-validation using 12 real traits and 1.1 million variants on 350,000 individuals from the UK Biobank, SBayesR improves prediction accuracy relative to commonly used state-of-the-art summary statistics methods at a fraction of the computational resources. Furthermore, using summary statistics for variants from the largest GWAS meta-analysis (n ≈ 700, 000) on height and BMI, we show that on average across traits and two independent data sets that SBayesR improves prediction R2 by 5.2% relative to LDpred and by 26.5% relative to clumping and p value thresholding.


2021 ◽  
Vol 118 (25) ◽  
pp. e2023184118
Author(s):  
Yuchang Wu ◽  
Xiaoyuan Zhong ◽  
Yunong Lin ◽  
Zijie Zhao ◽  
Jiawen Chen ◽  
...  

Marginal effect estimates in genome-wide association studies (GWAS) are mixtures of direct and indirect genetic effects. Existing methods to dissect these effects require family-based, individual-level genetic, and phenotypic data with large samples, which is difficult to obtain in practice. Here, we propose a statistical framework to estimate direct and indirect genetic effects using summary statistics from GWAS conducted on own and offspring phenotypes. Applied to birth weight, our method showed nearly identical results with those obtained using individual-level data. We also decomposed direct and indirect genetic effects of educational attainment (EA), which showed distinct patterns of genetic correlations with 45 complex traits. The known genetic correlations between EA and higher height, lower body mass index, less-active smoking behavior, and better health outcomes were mostly explained by the indirect genetic component of EA. In contrast, the consistently identified genetic correlation of autism spectrum disorder (ASD) with higher EA resides in the direct genetic component. A polygenic transmission disequilibrium test showed a significant overtransmission of the direct component of EA from healthy parents to ASD probands. Taken together, we demonstrate that traditional GWAS approaches, in conjunction with offspring phenotypic data collection in existing cohorts, could greatly benefit studies on genetic nurture and shed important light on the interpretation of genetic associations for human complex traits.


Author(s):  
Yuchang Wu ◽  
Xiaoyuan Zhong ◽  
Yunong Lin ◽  
Zijie Zhao ◽  
Jiawen Chen ◽  
...  

AbstractMarginal effect estimates in genome-wide association studies (GWAS) are mixtures of direct and indirect genetic effects. Existing methods to dissect these effects require family-based, individual-level genetic and phenotypic data with large samples, which is difficult to obtain in practice. Here, we propose a novel statistical framework to estimate direct and indirect genetic effects using summary statistics from GWAS conducted on own and offspring phenotypes. Applied to birth weight, our method showed nearly identical results with those obtained using individual-level data. We also decomposed direct and indirect genetic effects of educational attainment (EA), which showed distinct patterns of genetic correlations with 45 complex traits. The known genetic correlations between EA and higher height, lower BMI, less active smoking behavior, and better health outcomes were mostly explained by the indirect genetic component of EA. In contrast, the consistently identified genetic correlation of autism spectrum disorder (ASD) with higher EA resides in the direct genetic component. Polygenic transmission disequilibrium test showed a significant over-transmission of the direct component of EA from healthy parents to ASD probands. Taken together, we demonstrate that traditional GWAS approaches, in conjunction with offspring phenotypic data collection in existing cohorts, could greatly benefit studies on genetic nurture and shed important light on the interpretation of genetic associations for human complex traits.


2020 ◽  
Author(s):  
Konstantin Senkevich ◽  
Sara Bandres-Ciga ◽  
Eric Yu ◽  
Upekha E. Liyanage ◽  
Alastair J Noyce ◽  
...  

AbstractBackground and objectivesMost cancers appear with reduced frequency in Parkinson’s disease (PD), but the prevalence of melanoma and brain cancers are often reported to be increased. Shared genetic architecture and causal relationships to explain these associations have not been fully explored.MethodsLinkage disequilibrium score regression (LDSC) was applied for five cancer studies with available genome-wide association studies (GWAS) summary statistics to examine genetic correlations with PD. Additionally, we used GWAS summary statistics of 15 different types of cancers as exposures and two-sample Mendelian randomization to study the causal relationship with PD (outcome).ResultsLDSC analysis revealed a potential genetic correlation between PD and melanoma, breast cancer and prostate cancer. There was no evidence to support a causal relationship between the studied cancers and PD.ConclusionsOur results suggest shared genetic architecture between PD and melanoma, breast, and prostate cancers, but no obvious causal relationship between cancers and PD.


2017 ◽  
Author(s):  
Jorien L. Treur ◽  
Mark Gibson ◽  
Amy E Taylor ◽  
Peter J Rogers ◽  
Marcus R Munafò

AbstractStudy Objectives:Higher caffeine consumption has been linked to poorer sleep and insomnia complaints. We investigated whether these observational associations are the result of genetic risk factors influencing both caffeine consumption and poorer sleep, and/or whether they reflect (possibly bidirectional) causal effects.Methods:Summary-level data were available from genome-wide association studies (GWAS) on caffeine consumption (n=91,462), sleep duration, and chronotype (i.e., being a ‘morning’ versus an ‘evening’ person) (both n=128,266), and insomnia complaints (n=113,006). Linkage disequilibrium (LD) score regression was used to calculate genetic correlations, reflecting the extent to which genetic variants influencing caffeine consumption and sleep behaviours overlap. Causal effects were tested with bidirectional, two-sample Mendelian randomization (MR), an instrumental variable approach that utilizes genetic variants robustly associated with an exposure variable as an instrument to test causal effects. Estimates from individual genetic variants were combined using inverse-variance weighted meta-analysis, weighted median regression and MR Egger regression methods.Results:There was no clear evidence for genetic correlation between caffeine consumption and sleep duration (rg=0.000,p=0.998), chronotype (rg=0.086,p=0.192) or insomnia (rg=-0.034,p=0.700). Two-sample Mendelian randomization analyses did not support causal effects from caffeine consumption to sleep behaviours, or the other way around.Conclusions:We found no evidence in support of genetic correlation or causal effects between caffeine consumption and sleep. While caffeine may have acute effects on sleep when taken shortly before habitual bedtime, our findings suggest that a more sustained pattern of high caffeine consumption is likely associated with poorer sleep through shared environmental factors.


2018 ◽  
Vol 21 (2) ◽  
pp. 84-88 ◽  
Author(s):  
W. David Hill

Intelligence and educational attainment are strongly genetically correlated. This relationship can be exploited by Multi-Trait Analysis of GWAS (MTAG) to add power to Genome-wide Association Studies (GWAS) of intelligence. MTAG allows the user to meta-analyze GWASs of different phenotypes, based on their genetic correlations, to identify association's specific to the trait of choice. An MTAG analysis using GWAS data sets on intelligence and education was conducted by Lam et al. (2017). Lam et al. (2017) reported 70 loci that they described as ‘trait specific’ to intelligence. This article examines whether the analysis conducted by Lam et al. (2017) has resulted in genetic information about a phenotype that is more similar to education than intelligence.


2012 ◽  
Vol 6 ◽  
pp. BBI.S9867 ◽  
Author(s):  
Guanjie Chen ◽  
Ao Yuan ◽  
Jie Zhou ◽  
Amy R. Bentley ◽  
Adebowale Adeyemo ◽  
...  

Missing heritability is still a challenge for Genome Wide Association Studies (GWAS). Gene-gene interactions may partially explain this residual genetic influence and contribute broadly to complex disease. To analyze the gene-gene interactions in case-control studies of complex disease, we propose a simple, non-parametric method that utilizes the F-statistic. This approach consists of three steps. First, we examine the joint distribution of a pair of SNPs in cases and controls separately. Second, an F-test is used to evaluate the ratio of dependence in cases to that of controls. Finally, results are adjusted for multiple tests. This method was used to evaluate gene-gene interactions that are associated with risk of Type 2 Diabetes among African Americans in the Howard University Family Study. We identified 18 gene-gene interactions ( P < 0.0001). Compared with the commonly-used logistical regression method, we demonstrate that the F-ratio test is an efficient approach to measuring gene-gene interactions, especially for studies with limited sample size.


2019 ◽  
Author(s):  
Hanmin Guo ◽  
James J. Li ◽  
Qiongshi Lu ◽  
Lin Hou

AbstractGenetic correlation analysis has quickly gained popularity in the past few years and provided insights into the genetic etiology of numerous complex diseases. However, existing approaches oversimplify the shared genetic architecture between different phenotypes and cannot effectively identify precise genetic regions contributing to the genetic correlation. In this work, we introduce LOGODetect, a powerful and efficient statistical method to identify small genome segments harboring local genetic correlation signals. LOGODetect automatically identifies genetic regions showing consistent associations with multiple phenotypes through a scan statistic approach. It uses summary association statistics from genome-wide association studies (GWAS) as input and is robust to sample overlap between studies. Applied to five phenotypically distinct but genetically correlated psychiatric disorders, we identified 49 non-overlapping genome regions associated with multiple disorders, including multiple hub regions showing concordant effects on more than two disorders. Our method addresses critical limitations in existing analytic strategies and may have wide applications in post-GWAS analysis.


Sign in / Sign up

Export Citation Format

Share Document