Analysis of Polygenic Score Usage and Performance in Diverse Human Populations

Mapping Intimacies ◽

10.1101/398396 ◽

2018 ◽

Cited By ~ 18

Author(s):

LE Duncan ◽

H Shen ◽

B Gelaye ◽

KJ Ressler ◽

MW Feldman ◽

...

Keyword(s):

Large Scale ◽

Association Studies ◽

African Ancestry ◽

Population Level ◽

European Ancestry ◽

Human Populations ◽

Genome Wide Association Studies ◽

Polygenic Score ◽

Methodological Choices ◽

Polygenic Scores

AbstractStudies examining relationships between genotypic and phenotypic variation have historically been carried out on people of European ancestry. Efforts are underway to address this limitation, but until they succeed, the legacy of a Euro-centric bias will continue to hinder research, including the use of polygenic scores, which are individual-level metrics of genetic risk. Ongoing debate surrounds the generalizability of polygenic scores based on genome-wide association studies (GWAS) conducted in European ancestry samples, to non-European ancestry samples. We analyzed the first decade of polygenic scoring studies (2008-2017, inclusive), and found that 67% of studies included exclusively European ancestry participants and another 19% included only East Asian ancestry participants. Only 3.8% of studies were carried out on samples of African, Hispanic, or Indigenous peoples. We find that effect sizes for European ancestry-derived polygenic scores are only 36% as large in African ancestry samples, as in European ancestry samples (t=−10.056, df=22, p=5.5×10−10). Analyzing global populations, we show that relationships between height polygenic scores and height are highly dependent on methodological choices in polygenic score construction, highlighting the need for caution in interpreting population level differences in distributions of polygenic scores, as currently calculated. These findings bolster the rationale for large-scale GWAS in diverse human populations and highlight the need for better handling of linkage disequilibrium and variant frequencies when applying scores to non-European samples.

Download Full-text

Theoretical and empirical quantification of the accuracy of polygenic scores in ancestry divergent populations

10.1101/2020.01.14.905927 ◽

2020 ◽

Cited By ~ 2

Author(s):

Ying Wang ◽

Jing Guo ◽

Guiyan Ni ◽

Jian Yang ◽

Peter M. Visscher ◽

...

Keyword(s):

Complex Traits ◽

Association Studies ◽

African Ancestry ◽

Real Data ◽

European Ancestry ◽

Genome Wide Association Studies ◽

Genome Wide ◽

Polygenic Scores ◽

Causal Variants ◽

The Uk

AbstractPolygenic scores (PGS) have been widely used to predict complex traits and risk of diseases using variants identified from genome-wide association studies (GWASs). To date, most GWASs have been conducted in populations of European ancestry, which limits the use of GWAS-derived PGS in non-European populations. Here, we develop a new theory to predict the relative accuracy (RA, relative to the accuracy in populations of the same ancestry as the discovery population) of PGS across ancestries. We used simulations and real data from the UK Biobank to evaluate our results. We found across various simulation scenarios that the RA of PGS based on trait-associated SNPs can be predicted accurately from modelling linkage disequilibrium (LD), minor allele frequencies (MAF), cross-population correlations of SNP effect sizes and heritability. Altogether, we find that LD and MAF differences between ancestries explain alone up to ~70% of the loss of RA using European-based PGS in African ancestry for traits like body mass index and height. Our results suggest that causal variants underlying common genetic variation identified in European ancestry GWASs are mostly shared across continents.

Download Full-text

Population differentiation of polygenic score predictions under stabilizing selection

10.1101/2021.09.10.459833 ◽

2021 ◽

Author(s):

Sivan Yair ◽

Graham Coop

Keyword(s):

Genetic Variation ◽

Large Scale ◽

Association Studies ◽

Genomic Medicine ◽

Evolutionary Model ◽

Stabilizing Selection ◽

Genome Wide Association Studies ◽

Polygenic Score ◽

Polygenic Scores ◽

The Impact

1AbstractGiven the many loci uncovered by genome-wide association studies (GWAS), polygenic scores have become central to the drive for genomic medicine and have spread into various areas including evolutionary studies of adaptation. While promising, these scores are fraught with issues of portability across populations, due to the mis-estimation of effect sizes and missing causal loci across populations not represented in large-scale GWAS. The poor portability of polygenic scores at first seems at odds with the view that much of common genetic variation is shared among populations (Lewontin, 1972). Here we investigate one potential cause of this discrepancy: phenotypic stabilizing selection drives the turnover of genetic variation shared between populations at causal loci. Somewhat counter-intuitively, while stabilizing selection to the same optimum phenotype leads to lower phenotypic differentiation among populations, it increases genetic differentiation at GWAS loci and reduces the portability of polygenic scores constructed for unrepresented populations. We also find that stabilizing selection can lead to potentially misleading signals of the differentiation of average polygenic scores among populations. We extend our baseline model to investigate the impact of pleiotropy, gene-by-environment interactions, and directional selection on polygenic score predictions. Our work emphasizes stabilizing selection as a null evolutionary model to understand patterns of allele frequency differentiation and its impact on polygenic score portability and differentiation.

Download Full-text

Trans-Ethnic Meta-Analysis of Interactions Between Genetics and Early-Life Socioeconomic Context on Memory Performance and Decline in Older Americans

The Journals of Gerontology Series A ◽

10.1093/gerona/glab255 ◽

2021 ◽

Author(s):

Jessica D Faul ◽

Minjung Kho ◽

Wei Zhao ◽

Kalee E Rumfelt ◽

Miao Yu ◽

...

Keyword(s):

Parental Education ◽

Association Studies ◽

Memory Performance ◽

Meta Analysis ◽

African Ancestry ◽

Later Life ◽

European Ancestry ◽

Genome Wide Association Studies ◽

Multiple Testing Correction ◽

Socioeconomic Context

Abstract Background Later-life cognitive function is influenced by genetics as well as early- and later-life socioeconomic context. However, few studies have examined the interaction between genetics and early childhood factors. Methods Using gene-based tests (interaction sequence kernel association test [iSKAT]/iSKAT optimal unified test), we examined whether common and/or rare exonic variants in 39 gene regions previously associated with cognitive performance, dementia, and related traits had an interaction with childhood socioeconomic context (parental education and financial strain) on memory performance or decline in European ancestry (EA, N = 10 468) and African ancestry (AA, N = 2 252) participants from the Health and Retirement Study. Results Of the 39 genes, 22 in EA and 19 in AA had nominally significant interactions with at least one childhood socioeconomic measure on memory performance and/or decline; however, all but one (father’s education by solute carrier family 24 member 4 [SLC24A4] in AA) were not significant after multiple testing correction (false discovery rate [FDR] < .05). In trans-ethnic meta-analysis, 2 genes interacted with childhood socioeconomic context (FDR < .05): mother’s education by membrane-spanning 4-domains A4A (MS4A4A) on memory performance, and father’s education by SLC24A4 on memory decline. Both interactions remained significant (p < .05) after adjusting for respondent’s own educational attainment, apolipoprotein-ε4 allele (APOE ε4) status, lifestyle factors, body mass index, and comorbidities. For both interactions in EA and AA, the genetic effect was stronger in participants with low parental education. Conclusions Examination of common and rare variants in genes discovered through genome-wide association studies shows that childhood context may interact with key gene regions to jointly impact later-life memory function and decline. Genetic effects may be more salient for those with lower childhood socioeconomic status.

Download Full-text

A Review of the Hereditary Component of Triple Negative Breast Cancer: High- and Moderate-Penetrance Breast Cancer Genes, Low-Penetrance Loci, and the Role of Nontraditional Genetic Elements

Journal of Oncology ◽

10.1155/2019/4382606 ◽

2019 ◽

Vol 2019 ◽

pp. 1-10 ◽

Cited By ~ 11

Author(s):

Darrell L. Ellsworth ◽

Clesson E. Turner ◽

Rachel E. Ellsworth

Keyword(s):

Breast Cancer ◽

Triple Negative Breast Cancer ◽

Large Scale ◽

Triple Negative ◽

Association Studies ◽

African Ancestry ◽

Genome Wide Association Studies ◽

Genetic Elements ◽

Genome Wide ◽

Increased Risk

Triple negative breast cancer (TNBC), representing 10-15% of breast tumors diagnosed each year, is a clinically defined subtype of breast cancer associated with poor prognosis. The higher incidence of TNBC in certain populations such as young women and/or women of African ancestry and a unique pathological phenotype shared between TNBC and BRCA1-deficient tumors suggest that TNBC may be inherited through germline mutations. In this article, we describe genes and genetic elements, beyond BRCA1 and BRCA2, which have been associated with increased risk of TNBC. Multigene panel testing has identified high- and moderate-penetrance cancer predisposition genes associated with increased risk for TNBC. Development of large-scale genome-wide SNP assays coupled with genome-wide association studies (GWAS) has led to the discovery of low-penetrance TNBC-associated loci. Next-generation sequencing has identified variants in noncoding RNAs, viral integration sites, and genes in underexplored regions of the human genome that may contribute to the genetic underpinnings of TNBC. Advances in our understanding of the genetics of TNBC are driving improvements in risk assessment and patient management.

Download Full-text

Contributions of common genetic variants to risk of schizophrenia among individuals of African and Latino ancestry

Molecular Psychiatry ◽

10.1038/s41380-019-0517-y ◽

2019 ◽

Vol 25 (10) ◽

pp. 2455-2467 ◽

Cited By ~ 12

Author(s):

Tim B. Bigdeli ◽

◽

Giulio Genovese ◽

Penelope Georgakopoulos ◽

Jacquelyn L. Meyers ◽

...

Keyword(s):

Genetic Variants ◽

Association Studies ◽

Large Fraction ◽

African Ancestry ◽

Common Variant ◽

Human Populations ◽

Polygenic Risk Score ◽

Genome Wide Association Studies ◽

Genome Wide ◽

Common Genetic Variants

Abstract Schizophrenia is a common, chronic and debilitating neuropsychiatric syndrome affecting tens of millions of individuals worldwide. While rare genetic variants play a role in the etiology of schizophrenia, most of the currently explained liability is within common variation, suggesting that variation predating the human diaspora out of Africa harbors a large fraction of the common variant attributable heritability. However, common variant association studies in schizophrenia have concentrated mainly on cohorts of European descent. We describe genome-wide association studies of 6152 cases and 3918 controls of admixed African ancestry, and of 1234 cases and 3090 controls of Latino ancestry, representing the largest such study in these populations to date. Combining results from the samples with African ancestry with summary statistics from the Psychiatric Genomics Consortium (PGC) study of schizophrenia yielded seven newly genome-wide significant loci, and we identified an additional eight loci by incorporating the results from samples with Latino ancestry. Leveraging population differences in patterns of linkage disequilibrium, we achieve improved fine-mapping resolution at 22 previously reported and 4 newly significant loci. Polygenic risk score profiling revealed improved prediction based on trans-ancestry meta-analysis results for admixed African (Nagelkerke’s R2 = 0.032; liability R2 = 0.017; P < 10−52), Latino (Nagelkerke’s R2 = 0.089; liability R2 = 0.021; P < 10−58), and European individuals (Nagelkerke’s R2 = 0.089; liability R2 = 0.037; P < 10−113), further highlighting the advantages of incorporating data from diverse human populations.

Download Full-text

Efficiently controlling for case-control imbalance and sample relatedness in large-scale genetic association studies

10.1101/212357 ◽

2017 ◽

Cited By ~ 7

Author(s):

Wei Zhou ◽

Jonas B. Nielsen ◽

Lars G. Fritsche ◽

Rounak Dey ◽

Maiken E. Gabrielsen ◽

...

Keyword(s):

Large Scale ◽

Mixed Model ◽

Linear Mixed Model ◽

Association Studies ◽

Case Control ◽

Error Rates ◽

European Ancestry ◽

Computational Time ◽

Type I ◽

Genome Wide Association Studies

AbstractIn genome-wide association studies (GWAS) for thousands of phenotypes in large biobanks, most binary traits have substantially fewer cases than controls. Both of the widely used approaches, linear mixed model and the recently proposed logistic mixed model, perform poorly – producing large type I error rates – in the analysis of phenotypes with unbalanced case-control ratios. Here we propose a scalable and accurate generalized mixed model association test that uses the saddlepoint approximation (SPA) to calibrate the distribution of score test statistics. This method, SAIGE, provides accurate p-values even when case-control ratios are extremely unbalanced. It utilizes state-of-art optimization strategies to reduce computational time and memory cost of generalized mixed model. The computation cost linearly depends on sample size, and hence can be applicable to GWAS for thousands of phenotypes by large biobanks. Through the analysis of UK Biobank data of 408,961 white British European-ancestry samples for >1400 binary phenotypes, we show that SAIGE can efficiently analyze large sample data, controlling for unbalanced case-control ratios and sample relatedness.

Download Full-text

Large-Scale Phenomic and Genomic Analysis of Brain Asymmetrical Skew

Cerebral Cortex ◽

10.1093/cercor/bhab075 ◽

2021 ◽

Author(s):

Xiang-Zhen Kong ◽

Merel Postema ◽

Dick Schijven ◽

Amaia Carrión Castillo ◽

Antonietta Pepe ◽

...

Keyword(s):

Large Scale ◽

Right Hemisphere ◽

Association Studies ◽

Genomic Analysis ◽

Population Level ◽

Genome Wide Association Studies ◽

Nucleotide Polymorphisms ◽

Imaging Data ◽

Physical And Mental Health ◽

Human Connectome Project

Abstract The human cerebral hemispheres show a left–right asymmetrical torque pattern, which has been claimed to be absent in chimpanzees. The functional significance and developmental mechanisms are unknown. Here, we carried out the largest-ever analysis of global brain shape asymmetry in magnetic resonance imaging data. Three population datasets were used, UK Biobank (N = 39 678), Human Connectome Project (N = 1113), and BIL&GIN (N = 453). At the population level, there was an anterior and dorsal skew of the right hemisphere, relative to the left. Both skews were associated independently with handedness, and various regional gray and white matter metrics oppositely in the two hemispheres, as well as other variables related to cognitive functions, sociodemographic factors, and physical and mental health. The two skews showed single nucleotide polymorphisms-based heritabilities of 4–13%, but also substantial polygenicity in causal mixture model analysis, and no individually significant loci were found in genome-wide association studies for either skew. There was evidence for a significant genetic correlation between horizontal brain skew and autism, which requires future replication. These results provide the first large-scale description of population-average brain skews and their inter-individual variations, their replicable associations with handedness, and insights into biological and other factors which associate with human brain asymmetry.

Download Full-text

Effect sizes of causal variants for gene expression and complex traits differ between populations

10.1101/2021.12.06.471235 ◽

2021 ◽

Author(s):

Roshni A. Patel ◽

Shaila A. Musharoff ◽

Jeffrey P. Spence ◽

Harold Pimentel ◽

Catherine Tcheandjieu ◽

...

Keyword(s):

Gene Expression ◽

Complex Traits ◽

Association Studies ◽

Causal Variant ◽

Effect Sizes ◽

European Ancestry ◽

Genome Wide Association Studies ◽

Polygenic Scores ◽

Causal Variants ◽

Variant Effect

Despite the growing number of genome-wide association studies (GWAS) for complex traits, it remains unclear whether effect sizes of causal genetic variants differ between populations. In principle, effect sizes of causal variants could differ between populations due to gene-by-gene or gene-by-environment interactions. However, comparing causal variant effect sizes is challenging: it is difficult to know which variants are causal, and comparisons of variant effect sizes are confounded by differences in linkage disequilibrium (LD) structure between ancestries. Here, we develop a method to assess causal variant effect size differences that overcomes these limitations. Specifically, we leverage the fact that segments of European ancestry shared between European-American and admixed African-American individuals have similar LD structure, allowing for unbiased comparisons of variant effect sizes in European ancestry segments. We apply our method to two types of traits: gene expression and low-density lipoprotein cholesterol (LDL-C). We find that causal variant effect sizes for gene expression are significantly different between European-Americans and African-Americans; for LDL-C, we observe a similar point estimate although this is not significant, likely due to lower statistical power. Cross-population differences in variant effect sizes highlight the role of genetic interactions in trait architecture and will contribute to the poor portability of polygenic scores across populations, reinforcing the importance of conducting GWAS on individuals of diverse ancestries and environments.

Download Full-text

Polygenic Adaptation has Impacted Multiple Anthropometric Traits

10.1101/167551 ◽

2017 ◽

Cited By ~ 30

Author(s):

Jeremy J. Berg ◽

Xinjun Zhang ◽

Graham Coop

Keyword(s):

Complex Traits ◽

Association Studies ◽

Gwas Data ◽

Human Populations ◽

Genome Wide Association Studies ◽

Uk Biobank ◽

Link Type ◽

Polygenic Scores ◽

Polygenic Adaptation ◽

The Uk

AbstractOur understanding of the genetic basis of human adaptation is biased toward loci of large pheno-typic effect. Genome wide association studies (GWAS) now enable the study of genetic adaptation in polygenic phenotypes. We test for polygenic adaptation among 187 world-wide human populations using polygenic scores constructed from GWAS of 34 complex traits. We identify signals of polygenic adaptation for anthropometric traits including height, infant head circumference (IHC), hip circumference and waist-to-hip ratio (WHR). Analysis of ancient DNA samples indicates that a north-south cline of height within Europe and and a west-east cline across Eurasia can be traced to selection for increased height in two late Pleistocene hunter gatherer populations living in western and west-central Eurasia. Our observation that IHC and WHR follow a latitudinal cline in Western Eurasia support the role of natural selection driving Bergmann’s Rule in humans, consistent with thermoregulatory adaptation in response to latitudinal temperature variation.Author’s Note on Failure to ReplicateAfter this preprint was posted, the UK Biobank dataset was released, providing a new and open GWAS resource. When attempting to replicate the height selection results from this preprint using GWAS data from the UK Biobank, we discovered that we could not. In subsequent analyses, we determined that both the GIANT consortium height GWAS data, as well as another dataset that was used for replication, were impacted by stratification issues that created or at a minimum substantially inflated the height selection signals reported here. The results of this second investigation, written together with additional coauthors, have now been published (https://elifesciences.org/articles/39725 along with another paper by a separate group of authors, showing similar issues https://elifesciences.org/articles/39702). A preliminary investigation shows that the other non-height based results may suffer from similar issues. We stand by the theory and statistical methods reported in this paper, and the paper can be cited for these results. However, we have shown that the data on which the major empirical results were based are not sound, and so should be treated with caution until replicated.

Download Full-text

Polygenic scores for height in admixed populations

10.1101/2020.04.08.030361 ◽

2020 ◽

Cited By ~ 2

Author(s):

Bárbara D. Bitarello ◽

Iain Mathieson

Keyword(s):

Predictive Power ◽

Predictive Accuracy ◽

Association Studies ◽

Effect Sizes ◽

European Ancestry ◽

Marginal Effect ◽

Genome Wide Association Studies ◽

Major Barrier ◽

Cohort Differences ◽

Polygenic Scores

AbstractPolygenic risk scores (PRS) use the results of genome-wide association studies (GWAS) to predict quantitative phenotypes or disease risk at an individual level. This provides a potential route to the use of genetic data in personalized medical care. However, a major barrier to the use of PRS is that the majority of GWAS come from cohorts of European ancestry. The predictive power of PRS constructed from these studies is substantially lower in non-European ancestry cohorts, although the reasons for this are unclear. To address this question, we investigate the performance of PRS for height in cohorts with admixed African and European ancestry, allowing us to evaluate ancestry-related differences in PRS predictive accuracy while controlling for environment and cohort differences. We first show that that the predictive accuracy of height PRS increases linearly with European ancestry and is largely explained by European ancestry segments of the admixed genomes. We show that differences in allele frequencies, recombination rate, and marginal effect sizes across ancestries all contribute to the decrease in predictive power, but none of these effects explain the decrease on its own. Finally, we demonstrate that prediction for admixed individuals can be improved by using a linear combination of PRS that includes ancestry-specific effect sizes, although this approach is at present limited by the small size of non-European ancestry discovery cohorts.

Download Full-text