Human demographic history impacts genetic risk prediction across diverse populations

Mapping Intimacies ◽

10.1101/070797 ◽

2016 ◽

Cited By ~ 7

Author(s):

Alicia R. Martin ◽

Christopher R. Gignoux ◽

Raymond K. Walters ◽

Genevieve L. Wojcik ◽

Benjamin M. Neale ◽

...

Keyword(s):

Risk Prediction ◽

Large Scale ◽

Disease Risk ◽

Association Studies ◽

Demographic History ◽

Population History ◽

Risk Scores ◽

Genome Wide Association Studies ◽

Summary Statistics ◽

Medical Genomics

AbstractThe vast majority of genome-wide association studies are performed in Europeans, and their transferability to other populations is dependent on many factors (e.g. linkage disequilibrium, allele frequencies, genetic architecture). As medical genomics studies become increasingly large and diverse, gaining insights into population history and consequently the transferability of disease risk measurement is critical. Here, we disentangle recent population history in the widely-used 1000 Genomes Project reference panel, with an emphasis on populations underrepresented in medical studies. To examine the transferability of single-ancestry GWAS, we used published summary statistics to calculate polygenic risk scores for six well-studied traits and diseases. We identified directional inconsistencies in all scores; for example, height is predicted to decrease with genetic distance from Europeans, despite robust anthropological evidence that West Africans are as tall as Europeans on average. To gain deeper quantitative insights into GWAS transferability, we developed a complex trait coalescent-based simulation framework considering effects of polygenicity, causal allele frequency divergence, and heritability. As expected, correlations between true and inferred risk were typically highest in the population from which summary statistics were derived. We demonstrated that scores inferred from European GWAS were biased by genetic drift in other populations even when choosing the same causal variants, and that biases in any direction were possible and unpredictable. This work cautions that summarizing findings from large-scale GWAS may have limited portability to other populations using standard approaches, and highlights the need for generalized risk prediction methods and the inclusion of more diverse individuals in medical genomics.

Download Full-text

Population history of the Sardinian people inferred from whole-genome sequencing

10.1101/092148 ◽

2016 ◽

Cited By ~ 5

Author(s):

Charleston W K Chiang ◽

Joseph H Marcus ◽

Carlo Sidore ◽

Hussein Al-Asadi ◽

Magdalena Zoledziewska ◽

...

Keyword(s):

Bronze Age ◽

Disease Risk ◽

Association Studies ◽

Demographic History ◽

Population History ◽

Genome Wide Association Studies ◽

Whole Genome ◽

Risk Alleles ◽

Mediterranean Island ◽

History Of

AbstractThe population of the Mediterranean island of Sardinia has made important contributions to genome-wide association studies of traits and diseases. The history of the Sardinian population has also been the focus of much research, and in recent ancient DNA (aDNA) studies, Sardinia has provided unique insight into the peopling of Europe and the spread of agriculture. In this study, we analyze whole-genome sequences of 3,514 Sardinians to address hypotheses regarding the founding of Sardinia and its relation to the peopling of Europe, including examining fine-scale substructure, population size history, and signals of admixture. We find the population of the mountainous Gennargentu region shows elevated genetic isolation with higher levels of ancestry associated with mainland Neolithic farmers and depleted ancestry associated with more recent Bronze Age Steppe migrations on the mainland. Notably, the Gennargentu region also has elevated levels of pre-Neolithic hunter-gatherer ancestry and increased affinity to Basque populations. Further, allele sharing with pre-Neolithic and Neolithic mainland populations is larger on the X chromosome compared to the autosome, providing evidence for a sex-biased demographic history in Sardinia. These results give new insight to the demography of ancestral Sardinians and help further the understanding of sharing of disease risk alleles between Sardinia and mainland populations.

Download Full-text

Archetypal Analysis for Population Genetics

10.1101/2021.11.28.470296 ◽

2021 ◽

Author(s):

Julia Gimbernat-Mayol ◽

Daniel Mas Montserrat ◽

Carlos D. Bustamante ◽

Alexander G. Ioannidis

Keyword(s):

Large Scale ◽

Association Studies ◽

Demographic History ◽

Cluster Structure ◽

Risk Scores ◽

Computational Time ◽

Genome Wide Association Studies ◽

Current Standard ◽

Archetypal Analysis ◽

Genetic Clusters

The estimation of genetic clusters using genomic data has application from genome-wide association studies (GWAS) to demographic history to polygenic risk scores (PRS) and is expected to play an important role in the analyses of increasingly diverse, large-scale cohorts. However, existing methods are computationally-intensive, prohibitively so in the case of nationwide biobanks. Here we explore Archetypal Analysis as an efficient, unsupervised approach for identifying genetic clusters and for associating individuals with them. Such unsupervised approaches help avoid conflating socially constructed ethnic labels with genetic clusters by eliminating the need for exogenous training labels. We show that Archetypal Analysis yields similar cluster structure to existing unsupervised methods such as ADMIXTURE and provides interpretative advantages. More importantly, we show that since Archetypal Analysis can be used with lower-dimensional representations of genetic data, significant reductions in computational time and memory requirements are possible. When Archetypal Analysis is run in this fashion, it takes several orders of magnitude less compute time than the current standard, ADMIXTURE. Finally, we demonstrate uses ranging across datasets from humans to canids.

Download Full-text

Could personalised risk prediction for type 2 diabetes using polygenic risk scores direct prevention, enhance diagnostics, or improve treatment?

Wellcome Open Research ◽

10.12688/wellcomeopenres.16251.1 ◽

2020 ◽

Vol 5 ◽

pp. 206

Author(s):

Mathilde Boecker ◽

Alvina G. Lai

Keyword(s):

Type 2 Diabetes ◽

Risk Prediction ◽

Large Scale ◽

Association Studies ◽

Genome Wide Association ◽

Risk Scores ◽

Genome Wide Association Studies ◽

Polygenic Risk ◽

Genome Wide

Over the past three decades, the number of people globally with diabetes mellitus has more than doubled. It is estimated that by 2030, 439 million people will be suffering from the disease, 90-95% of whom will have type 2 diabetes (T2D). In 2017, 5 million deaths globally were attributable to T2D, placing it in the top 10 global causes of death. Because T2D is a result of both genetic and environmental factors, identification of individuals with high genetic risk can help direct early interventions to prevent progression to more serious complications. Genome-wide association studies have identified ~400 variants associated with T2D that can be used to calculate polygenic risk scores (PRS). Although PRSs are not currently more accurate than clinical predictors and do not yet predict risk with equal accuracy across all ethnic populations, they have several potential clinical uses. Here, we discuss potential usages of PRS for predicting T2D and for informing and optimising interventions. We also touch on possible health inequality risks of PRS and the feasibility of large-scale implementation of PRS in clinical practice. Before PRSs can be used as a therapeutic tool, it is important that further polygenic risk models are derived using non-European genome-wide association studies to ensure that risk prediction is accurate for all ethnic groups. Furthermore, it is essential that the ethical, social and legal implications of PRS are considered before their implementation in any context.

Download Full-text

Better estimation of SNP heritability from summary statistics provides a new understanding of the genetic architecture of complex traits

10.1101/284976 ◽

2018 ◽

Cited By ~ 6

Author(s):

Doug Speed ◽

David J Balding

Keyword(s):

Complex Traits ◽

Genetic Architecture ◽

Large Scale ◽

Association Studies ◽

Genome Wide Association Studies ◽

Summary Statistics ◽

Confounding Bias ◽

Conserved Regions ◽

Genome Wide ◽

Variation Explained

LD Score Regression (LDSC) has been widely applied to the results of genome-wide association studies. However, its estimates of SNP heritability are derived from an unrealistic model in which each SNP is expected to contribute equal heritability. As a consequence, LDSC tends to over-estimate confounding bias, under-estimate the total phenotypic variation explained by SNPs, and provide misleading estimates of the heritability enrichment of SNP categories. Therefore, we present SumHer, software for estimating SNP heritability from summary statistics using more realistic heritability models. After demonstrating its superiority over LDSC, we apply SumHer to the results of 24 large-scale association studies (average sample size 121 000). First we show that these studies have tended to substantially over-correct for confounding, and as a result the number of genome-wide significant loci has under-reported by about 20%. Next we estimate enrichment for 24 categories of SNPs defined by functional annotations. A previous study using LDSC reported that conserved regions were 13-fold enriched, and found a further twelve categories with above 2-fold enrichment. By contrast, our analysis using SumHer finds that conserved regions are only 1.6-fold (SD 0.06) enriched, and that no category has enrichment above 1.7-fold. SumHer provides an improved understanding of the genetic architecture of complex traits, which enables more efficient analysis of future genetic data.

Download Full-text

An atlas of polygenic risk score associations to highlight putative causal relationships across the human phenome

10.1101/467910 ◽

2018 ◽

Cited By ~ 1

Author(s):

Tom G. Richardson ◽

Sean Harrison ◽

Gibran Hemani ◽

George Davey Smith

Keyword(s):

Web Application ◽

Large Scale ◽

Complex Disease ◽

Association Studies ◽

Risk Scores ◽

Polygenic Risk Score ◽

Genome Wide Association Studies ◽

Genetic Liability ◽

Polygenic Risk ◽

The Uk

AbstractThe age of large-scale genome-wide association studies (GWAS) has provided us with an unprecedented opportunity to evaluate the genetic liability of complex disease using polygenic risk scores (PRS). In this study, we have analysed 162 PRS (P<5×l0 05) derived from GWAS and 551 heritable traits from the UK Biobank study (N=334,398). Findings can be investigated using a web application (http://mrcieu.mrsoftware.org/PRS_atlas/), which we envisage will help uncover both known and novel mechanisms which contribute towards disease susceptibility.To demonstrate this, we have investigated the results from a phenome-wide evaluation of schizophrenia genetic liability. Amongst findings were inverse associations with measures of cognitive function which extensive follow-up analyses using Mendelian randomization (MR) provided evidence of a causal relationship. We have also investigated the effect of multiple risk factors on disease using mediation and multivariable MR frameworks. Our atlas provides a resource for future endeavours seeking to unravel the causal determinants of complex disease.

Download Full-text

Fine-tuning Polygenic Risk Scores with GWAS Summary Statistics

10.1101/810713 ◽

2019 ◽

Cited By ~ 4

Author(s):

Zijie Zhao ◽

Yanyao Yi ◽

Yuchang Wu ◽

Xiaoyuan Zhong ◽

Yupei Lin ◽

...

Keyword(s):

Association Studies ◽

Fine Tuning ◽

Risk Scores ◽

Training Dataset ◽

Validation Dataset ◽

P Value ◽

Genome Wide Association Studies ◽

Summary Statistics ◽

Polygenic Risk ◽

Model Tuning

AbstractPolygenic risk scores (PRSs) have wide applications in human genetics research. Notably, most PRS models include tuning parameters which improve predictive performance when properly selected. However, existing model-tuning methods require individual-level genetic data as the training dataset or as a validation dataset independent from both training and testing samples. These data rarely exist in practice, creating a significant gap between PRS methodology and applications. Here, we introduce PUMAS (Parameter-tuning Using Marginal Association Statistics), a novel method to fine-tune PRS models using summary statistics from genome-wide association studies (GWASs). Through extensive simulations, external validations, and analysis of 65 traits, we demonstrate that PUMAS can perform a variety of model-tuning procedures (e.g. cross-validation) using GWAS summary statistics and can effectively benchmark and optimize PRS models under diverse genetic architecture. On average, PUMAS improves the predictive R2 by 205.6% and 62.5% compared to PRSs with arbitrary p-value cutoffs of 0.01 and 1, respectively. Applied to 211 neuroimaging traits and Alzheimer’s disease, we show that fine-tuned PRSs will significantly improve statistical power in downstream association analysis. We believe our method resolves a fundamental problem without a current solution and will greatly benefit genetic prediction applications.

Download Full-text

Leveraging effect size distributions to improve polygenic risk scores derived from summary statistics of genome-wide association studies

PLoS Computational Biology ◽

10.1371/journal.pcbi.1007565 ◽

2020 ◽

Vol 16 (2) ◽

pp. e1007565 ◽

Cited By ~ 1

Author(s):

Shuang Song ◽

Wei Jiang ◽

Lin Hou ◽

Hongyu Zhao

Keyword(s):

Effect Size ◽

Association Studies ◽

Genome Wide Association ◽

Risk Scores ◽

Genome Wide Association Studies ◽

Size Distributions ◽

Summary Statistics ◽

Polygenic Risk ◽

Genome Wide

Download Full-text

Genome-Wide Association Studies of Schizophrenia and Bipolar Disorder in a Diverse Cohort of US Veterans

Schizophrenia Bulletin ◽

10.1093/schbul/sbaa133 ◽

2020 ◽

Author(s):

Tim B Bigdeli ◽

Ayman H Fanous ◽

Yuli Li ◽

Nallakkandi Rajeevan ◽

Frederick Sayward ◽

...

Keyword(s):

Bipolar Disorder ◽

Association Studies ◽

Genome Wide Association ◽

Risk Scores ◽

Genome Wide Association Studies ◽

Summary Statistics ◽

Susceptibility Loci ◽

New Associations ◽

Genome Wide ◽

Us Veterans

Abstract Background Schizophrenia (SCZ) and bipolar disorder (BIP) are debilitating neuropsychiatric disorders, collectively affecting 2% of the world’s population. Recognizing the major impact of these psychiatric disorders on the psychosocial function of more than 200 000 US Veterans, the Department of Veterans Affairs (VA) recently completed genotyping of more than 8000 veterans with SCZ and BIP in the Cooperative Studies Program (CSP) #572. Methods We performed genome-wide association studies (GWAS) in CSP #572 and benchmarked the predictive value of polygenic risk scores (PRS) constructed from published findings. We combined our results with available summary statistics from several recent GWAS, realizing the largest and most diverse studies of these disorders to date. Results Our primary GWAS uncovered new associations between CHD7 variants and SCZ, and novel BIP associations with variants in Sortilin Related VPS10 Domain Containing Receptor 3 (SORCS3) and downstream of PCDH11X. Combining our results with published summary statistics for SCZ yielded 39 novel susceptibility loci including CRHR1, and we identified 10 additional findings for BIP (28 326 cases and 90 570 controls). PRS trained on published GWAS were significantly associated with case-control status among European American (P < 10–30) and African American (P < .0005) participants in CSP #572. Conclusions We have demonstrated that published findings for SCZ and BIP are robustly generalizable to a diverse cohort of US veterans. Leveraging available summary statistics from GWAS of global populations, we report 52 new susceptibility loci and improved fine-mapping resolution for dozens of previously reported associations.

Download Full-text

Dissociable influences of APOE ε4 and polygenic risk of AD dementia on amyloid and cognition

Neurology ◽

10.1212/wnl.0000000000005415 ◽

2018 ◽

Vol 90 (18) ◽

pp. e1605-e1612 ◽

Cited By ~ 24

Author(s):

Tian Ge ◽

Mert R. Sabuncu ◽

Jordan W. Smoller ◽

Reisa A. Sperling ◽

Elizabeth C. Mormino ◽

...

Keyword(s):

Cognitive Decline ◽

Genetic Risk ◽

Large Scale ◽

Association Studies ◽

Specific Effect ◽

Hippocampal Volume ◽

Risk Scores ◽

Hippocampal Atrophy ◽

Genome Wide Association Studies ◽

Polygenic Risk

ObjectiveTo investigate the effects of genetic risk of Alzheimer disease (AD) dementia in the context of β-amyloid (Aβ) accumulation.MethodsWe analyzed data from 702 participants (221 clinically normal, 367 with mild cognitive impairment, and 114 with AD dementia) with genetic data and florbetapir PET available. A subset of 669 participants additionally had longitudinal MRI scans to assess hippocampal volume. Polygenic risk scores (PRSs) were estimated with summary statistics from previous large-scale genome-wide association studies of AD dementia. We examined relationships between APOE ε4 status and PRS with longitudinal Aβ and cognitive and hippocampal volume measurements.ResultsAPOE ε4 was strongly related to baseline Aβ, whereas only weak associations between PRS and baseline Aβ were present. APOE ε4 was additionally related to greater memory decline and hippocampal atrophy in Aβ+ participants. When APOE ε4 was controlled for, PRS was related to cognitive decline in Aβ+ participants. Finally, PRSs were associated with hippocampal atrophy in Aβ− participants and weakly associated with baseline hippocampal volume in Aβ+ participants.ConclusionsGenetic risk factors of AD dementia demonstrate effects related to Aβ, as well as synergistic interactions with Aβ. The specific effect of faster cognitive decline in Aβ+ individuals with higher genetic risk may explain the large degree of heterogeneity in cognitive trajectories among Aβ+ individuals. Consideration of genetic variants in conjunction with baseline Aβ may improve enrichment strategies for clinical trials targeting Aβ+ individuals most at risk for imminent cognitive decline.

Download Full-text

Capturing SNP Association across the NK Receptor and HLA Gene Regions in Multiple Sclerosis by Targeted Penalised Regression Models

Genes ◽

10.3390/genes13010087 ◽

2021 ◽

Vol 13 (1) ◽

pp. 87

Author(s):

Sean M. Burnard ◽

Rodney A. Lea ◽

Miles Benton ◽

David Eccles ◽

Daniel W. Kennedy ◽

...

Keyword(s):

Multiple Sclerosis ◽

Complex Traits ◽

Multiple Testing ◽

Large Scale ◽

Disease Risk ◽

Association Studies ◽

Meta Analysis ◽

Elastic Net ◽

Genome Wide Association Studies ◽

Multiple Testing Correction

Conventional genome-wide association studies (GWASs) of complex traits, such as Multiple Sclerosis (MS), are reliant on per-SNP p-values and are therefore heavily burdened by multiple testing correction. Thus, in order to detect more subtle alterations, ever increasing sample sizes are required, while ignoring potentially valuable information that is readily available in existing datasets. To overcome this, we used penalised regression incorporating elastic net with a stability selection method by iterative subsampling to detect the potential interaction of loci with MS risk. Through re-analysis of the ANZgene dataset (1617 cases and 1988 controls) and an IMSGC dataset as a replication cohort (1313 cases and 1458 controls), we identified new association signals for MS predisposition, including SNPs above and below conventional significance thresholds while targeting two natural killer receptor loci and the well-established HLA loci. For example, rs2844482 (98.1% iterations), otherwise ignored by conventional statistics (p = 0.673) in the same dataset, was independently strongly associated with MS in another GWAS that required more than 40 times the number of cases (~45 K). Further comparison of our hits to those present in a large-scale meta-analysis, confirmed that the majority of SNPs identified by the elastic net model reached conventional statistical GWAS thresholds (p < 5 × 10−8) in this much larger dataset. Moreover, we found that gene variants involved in oxidative stress, in addition to innate immunity, were associated with MS. Overall, this study highlights the benefit of using more advanced statistical methods to (re-)analyse subtle genetic variation among loci that have a biological basis for their contribution to disease risk.

Download Full-text