Genotype Imputation Performance of Three Reference Panels Using African Ancestry Individuals

Using population-specific add-on polymorphisms to improve genotype imputation in underrepresented populations

10.1101/2021.02.03.429542 ◽

2021 ◽

Author(s):

Zhi Ming Xu ◽

Sina Rüeger ◽

Michaela Zwyer ◽

Daniela Brites ◽

Hellen Hiza ◽

...

Keyword(s):

Association Studies ◽

Imputation Accuracy ◽

Genotype Imputation ◽

Small Subset ◽

Study Cohort ◽

Genome Wide Association Studies ◽

Nucleotide Polymorphisms ◽

Single Nucleotide ◽

Genome Wide ◽

Selection Of

AbstractGenome-wide association studies rely on the statistical inference of untyped variants, called imputation, to increase the coverage of genotyping arrays. However, the results are often suboptimal in populations underrepresented in existing reference panels and array designs, since the selected single nucleotide polymorphisms (SNPs) may fail to capture population-specific haplotype structures, hence the full extent of common genetic variation. Here, we propose to sequence the full genome of a small subset of an underrepresented study cohort to inform the selection of population-specific add-on SNPs, such that the remaining array-genotyped cohort could be more accurately imputed. Using a Tanzania-based cohort as a proof-of-concept, we demonstrate the validity of our approach by showing improvements in imputation accuracy after the addition of our designed addon SNPs to the base H3Africa array.

Download Full-text

Recessive Genome-wide Meta-analysis Illuminates Genetic Architecture of Type 2 Diabetes

10.2337/figshare.17099780 ◽

2021 ◽

Author(s):

Mark J. O’Connor ◽

Philip Schroeder ◽

Alicia Huerta-Chagoya ◽

Paula Cortés-Sánchez ◽

Silvía Bonàs-Guarch ◽

...

Keyword(s):

Type 2 Diabetes ◽

Complex Traits ◽

Association Studies ◽

Meta Analysis ◽

Low Frequency ◽

Additive Models ◽

European Ancestry ◽

Genome Wide Association Studies ◽

Genome Wide

Most genome-wide association studies (GWAS) of complex traits are performed using models with additive allelic effects. Hundreds of loci associated with type 2 diabetes have been identified using this approach. Additive models, however, can miss loci with recessive effects, thereby leaving potentially important genes undiscovered. We conducted the largest GWAS meta-analysis using a recessive model for type 2 diabetes. Our discovery sample included 33,139 cases and 279,507 controls from seven European-ancestry cohorts including the UK Biobank. We identified 51 loci associated with type 2 diabetes, including five variants undetected by prior additive analyses. Two of the five had minor allele frequency less than 5% and were each associated with more than doubled risk in homozygous carriers. Using two additional cohorts, FinnGen and a Danish cohort, we replicated three of the variants, including one of the low-frequency variants, rs115018790, which had an odds ratio in homozygous carriers of 2.56 (95% CI 2.05-3.19, P=1´10-16) and a stronger effect in men than in women (interaction P=7´10-7). The signal was associated with multiple diabetes-related traits, with homozygous carriers showing a 10% decrease in LDL and a 20% increase in triglycerides, and colocalization analysis linked this signal to reduced expression of the nearby PELO gene. These results demonstrate that recessive models, when compared to GWAS using the additive approach, can identify novel loci, including large-effect variants with pathophysiological consequences relevant to type 2 diabetes.

Download Full-text

Advantages of genotype imputation with ethnically matched reference panel for rare variant association analyses

10.1101/579201 ◽

2019 ◽

Cited By ~ 4

Author(s):

Mart Kals ◽

Tiit Nikopensius ◽

Kristi Läll ◽

Kalle Pärn ◽

Timo Tõnis Sikka ◽

...

Keyword(s):

Rare Variant ◽

Rare Variants ◽

Association Studies ◽

Low Frequency ◽

Genotype Imputation ◽

Reference Panel ◽

Genome Wide Association Studies ◽

Genome Wide ◽

Variant Analysis ◽

Coding Variants

AbstractGenotype imputation has become a standard procedure prior genome-wide association studies (GWASs). For common and low-frequency variants, genotype imputation can be performed sufficiently accurately with publicly available and ethnically heterogeneous reference datasets like 1000 Genomes Project (1000G) and Haplotype Reference Consortium panels. However, the imputation of rare variants has been shown to be significantly more accurate when ethnically matched reference panel is used. Even more, greater genetic similarity between reference panel and target samples facilitates the detection of rare (or even population-specific) causal variants. Notwithstanding, the genome-wide downstream consequences and differences of using ethnically mixed and matched reference panels have not been yet comprehensively explored.We determined and quantified these differences by performing several comparative evaluations of the discovery-driven analysis scenarios. A variant-wise GWAS was performed on seven complex diseases and body mass index by using genome-wide genotype data of ∼37,000 Estonians imputed with ethnically mixed 1000G and ethnically matched imputation reference panels. Although several previously reported common (minor allele frequency; MAF > 5%) variant associations were replicated in both resulting imputed datasets, no major differences were observed among the genome-wide significant findings or in the fine-mapping effort. In the analysis of rare (MAF < 1%) coding variants, 46 significantly associated genes were identified in the ethnically matched imputed data as compared to four genes in the 1000G panel based imputed data. All resulting genes were consequently studied in the UK Biobank data.These associations provide a solid example of how rare variants can be efficiently analysed to discover novel, potentially functional genetic variants in relevant phenotypes. Furthermore, our work serves as proof of a cost-efficient study design, demonstrating that the usage of ethnically matched imputation reference panels can enable substantially improved imputation of rare variants, facilitating novel high-confidence findings in rare variant GWAS scans.Author summaryOver the last decade, genome-wide association studies (GWASs) have been widely used for detecting genetic biomarkers in a wide range of traits. Typically, GWASs are carried out using chip-based genotyping data, which are then combined with a more densely genotyped reference panel to infer untyped genetic variants in chip-typed individuals. The latter method is called genotype imputation and its accuracy depends on multiple factors. Publicly available and ethnically heterogeneous imputation reference panels (IRPs) such as 1000 Genomes Project (1000G) are sufficiently accurate for imputation of common and low-frequency variants, but custom ethnically matched IRPs outperform these in case of rare variants. In this work, we systematically compare downstream association analysis effects on eight complex traits in ∼37,000 Estonians imputed with ethnically mixed and ethnically matched IRPs. We do not observe major differences in the single variant analysis, where both imputed datasets replicate previously reported significant loci. But in the gene-based analysis of rare protein-coding variants we show that ethnically matched panel clearly outperforms 1000G panel based imputation, providing 10-fold increase in significant gene-trait associations. Our study demonstrates empirically that imputed data based on ethnically matched panel is very promising for rare variant analysis – it captures more population-specific variants and makes it possible to efficiently identify novel findings.

Download Full-text

Largest genome-wide association study for PTSD identifies genetic risk loci in European and African ancestries and implicates novel biological pathways

10.1101/458562 ◽

2018 ◽

Cited By ~ 6

Author(s):

Caroline M. Nievergelt ◽

Adam X. Maihofer ◽

Torsten Klengel ◽

Elizabeth G. Atkinson ◽

Chia-Yen Chen ◽

...

Keyword(s):

Genome Wide Association Study ◽

Association Studies ◽

Meta Analysis ◽

African Ancestry ◽

Genetic Correlations ◽

Genome Wide Association ◽

European Ancestry ◽

Genome Wide Association Studies ◽

Post Traumatic Stress ◽

Genome Wide

AbstractPost-traumatic stress disorder (PTSD) is a common and debilitating disorder. The risk of PTSD following trauma is heritable, but robust common variants have yet to be identified by genome-wide association studies (GWAS). We have collected a multi-ethnic cohort including over 30,000 PTSD cases and 170,000 controls. We first demonstrate significant genetic correlations across 60 PTSD cohorts to evaluate the comparability of these phenotypically heterogeneous studies. In this largest GWAS meta-analysis of PTSD to date we identify a total of 6 genome-wide significant loci, 4 in European and 2 in African-ancestry analyses. Follow-up analyses incorporated local ancestry and sex-specific effects, and functional studies. Along with other novel genes, a non-coding RNA (ncRNA) and a Parkinson’s Disease gene,PARK2, were associated with PTSD. Consistent with previous reports, SNP-based heritability estimates for PTSD range between 10-20%. Despite a significant shared liability between PTSD and major depressive disorder, we show evidence that some of our loci may be specific to PTSD. These results demonstrate the role of genetic variation contributing to the biology of differential risk for PTSD and the necessity of expanding GWAS beyond European ancestry.

Download Full-text

Theoretical and empirical quantification of the accuracy of polygenic scores in ancestry divergent populations

10.1101/2020.01.14.905927 ◽

2020 ◽

Cited By ~ 2

Author(s):

Ying Wang ◽

Jing Guo ◽

Guiyan Ni ◽

Jian Yang ◽

Peter M. Visscher ◽

...

Keyword(s):

Complex Traits ◽

Association Studies ◽

African Ancestry ◽

Real Data ◽

European Ancestry ◽

Genome Wide Association Studies ◽

Genome Wide ◽

Polygenic Scores ◽

Causal Variants ◽

The Uk

AbstractPolygenic scores (PGS) have been widely used to predict complex traits and risk of diseases using variants identified from genome-wide association studies (GWASs). To date, most GWASs have been conducted in populations of European ancestry, which limits the use of GWAS-derived PGS in non-European populations. Here, we develop a new theory to predict the relative accuracy (RA, relative to the accuracy in populations of the same ancestry as the discovery population) of PGS across ancestries. We used simulations and real data from the UK Biobank to evaluate our results. We found across various simulation scenarios that the RA of PGS based on trait-associated SNPs can be predicted accurately from modelling linkage disequilibrium (LD), minor allele frequencies (MAF), cross-population correlations of SNP effect sizes and heritability. Altogether, we find that LD and MAF differences between ancestries explain alone up to ~70% of the loss of RA using European-based PGS in African ancestry for traits like body mass index and height. Our results suggest that causal variants underlying common genetic variation identified in European ancestry GWASs are mostly shared across continents.

Download Full-text

A meta-analysis of genome-wide association studies of multiple myeloma among men and women of African ancestry

Blood Advances ◽

10.1182/bloodadvances.2019000491 ◽

2020 ◽

Vol 4 (1) ◽

pp. 181-190 ◽

Cited By ~ 2

Author(s):

Zhaohui Du ◽

Niels Weinhold ◽

Gregory Chi Song ◽

Kristin A. Rand ◽

David J. Van Den Berg ◽

...

Keyword(s):

Multiple Myeloma ◽

Association Studies ◽

Meta Analysis ◽

African Ancestry ◽

Genome Wide Association ◽

European Ancestry ◽

Admixture Mapping ◽

Genome Wide Association Studies ◽

Risk Alleles ◽

Genome Wide

Abstract Persons of African ancestry (AA) have a twofold higher risk for multiple myeloma (MM) compared with persons of European ancestry (EA). Genome-wide association studies (GWASs) support a genetic contribution to MM etiology in individuals of EA. Little is known about genetic risk factors for MM in individuals of AA. We performed a meta-analysis of 2 GWASs of MM in 1813 cases and 8871 controls and conducted an admixture mapping scan to identify risk alleles. We fine-mapped the 23 known susceptibility loci to find markers that could better capture MM risk in individuals of AA and constructed a polygenic risk score (PRS) to assess the aggregated effect of known MM risk alleles. In GWAS meta-analysis, we identified 2 suggestive novel loci located at 9p24.3 and 9p13.1 at P < 1 × 10−6; however, no genome-wide significant association was noted. In admixture mapping, we observed a genome-wide significant inverse association between local AA at 2p24.1-23.1 and MM risk in AA individuals. Of the 23 known EA risk variants, 20 showed directional consistency, and 9 replicated at P < .05 in AA individuals. In 8 regions, we identified markers that better capture MM risk in persons with AA. AA individuals with a PRS in the top 10% had a 1.82-fold (95% confidence interval, 1.56-2.11) increased MM risk compared with those with average risk (25%-75%). The strongest functional association was between the risk allele for variant rs56219066 at 5q15 and lower ELL2 expression (P = 5.1 × 10−12). Our study shows that common genetic variation contributes to MM risk in individuals with AA.

Download Full-text

Evaluation and application of summary statistic imputation to discover new height-associated loci

10.1101/204560 ◽

2017 ◽

Author(s):

Sina Rüeger ◽

Aaron McDaid ◽

Zoltán Kutalik

Keyword(s):

Genetic Variants ◽

Association Studies ◽

Low Frequency ◽

Cost Effective ◽

Genotype Imputation ◽

Genome Wide Association Studies ◽

Summary Statistics ◽

Uk Biobank ◽

Genome Wide ◽

The Uk

AbstractAs most of the heritability of complex traits is attributed to common and low frequency genetic variants, imputing them by combining genotyping chips and large sequenced reference panels is the most cost-effective approach to discover the genetic basis of these traits. Association summary statistics from genome-wide meta-analyses are available for hundreds of traits. Updating these to ever-increasing reference panels is very cumbersome as it requires reimputation of the genetic data, rerunning the association scan, and meta-analysing the results. A much more efficient method is to directly impute the summary statistics, termed as summary statistics imputation. Its performance relative to genotype imputation and practical utility has not yet been fully investigated. To this end, we compared the two approaches on real (genotyped and imputed) data from 120K samples from the UK Biobank and show that, while genotype imputation boasts a 2- to 5-fold lower root-mean-square error, summary statistics imputation better distinguishes true associations from null ones: We observed the largest differences in power for variants with low minor allele frequency and low imputation quality. For fixed false positive rates of 0.001, 0.01, 0.05, using summary statistics imputation yielded an increase in statistical power by 15, 10 and 3%, respectively. To test its capacity to discover novel associations, we applied summary statistics imputation to the GIANT height meta-analysis summary statistics covering HapMap variants, and identified 34 novel loci, 19 of which replicated using data in the UK Biobank. Additionally, we successfully replicated 55 out of the 111 variants published in an exome chip study. Our study demonstrates that summary statistics imputation is a very efficient and cost-effective way to identify and fine-map trait-associated loci. Moreover, the ability to impute summary statistics is important for follow-up analyses, such as Mendelian randomisation or LD-score regression.Author summaryGenome-wide association studies (GWASs) quantify the effect of genetic variants and traits, such as height. Such estimates are called association summary statistics and are typically publicly shared through publication. Typically, GWASs are carried out by genotyping ~ 500′000 SNVs for each individual which are then combined with sequenced reference panels to infer untyped SNVs in each’ individuals genome. This process of genotype imputation is resource intensive and can therefore be a limitation when combining many GWASs. An alternative approach is to bypass the use of individual data and directly impute summary statistics. In our work we compare the performance of summary statistics imputation to genotype imputation. Although we observe a 2- to 5-fold lower RMSE for genotype imputation compared to summary statistics imputation, summary statistics imputation better distinguishes true associations from null results. Furthermore, we demonstrate the potential of summary statistics imputation by presenting 34 novel height-associated loci, 19 of which were confirmed in UK Biobank. Our study demonstrates that given current reference panels, summary statistics imputation is a very efficient and cost-effective way to identify common or low-frequency trait-associated loci.

Download Full-text

International meta-analysis of PTSD genome-wide association studies identifies sex- and ancestry-specific genetic risk loci

Nature Communications ◽

10.1038/s41467-019-12576-w ◽

2019 ◽

Vol 10 (1) ◽

Cited By ~ 50

Author(s):

Caroline M. Nievergelt ◽

Adam X. Maihofer ◽

Torsten Klengel ◽

Elizabeth G. Atkinson ◽

Chia-Yen Chen ◽

...

Keyword(s):

Genome Wide Association Study ◽

Association Studies ◽

Meta Analysis ◽

African Ancestry ◽

Genome Wide Association ◽

European Ancestry ◽

Genome Wide Association Studies ◽

Genome Wide ◽

A Genome

Abstract The risk of posttraumatic stress disorder (PTSD) following trauma is heritable, but robust common variants have yet to be identified. In a multi-ethnic cohort including over 30,000 PTSD cases and 170,000 controls we conduct a genome-wide association study of PTSD. We demonstrate SNP-based heritability estimates of 5–20%, varying by sex. Three genome-wide significant loci are identified, 2 in European and 1 in African-ancestry analyses. Analyses stratified by sex implicate 3 additional loci in men. Along with other novel genes and non-coding RNAs, a Parkinson’s disease gene involved in dopamine regulation, PARK2, is associated with PTSD. Finally, we demonstrate that polygenic risk for PTSD is significantly predictive of re-experiencing symptoms in the Million Veteran Program dataset, although specific loci did not replicate. These results demonstrate the role of genetic variation in the biology of risk for PTSD and highlight the necessity of conducting sex-stratified analyses and expanding GWAS beyond European ancestry populations.

Download Full-text

Recessive Genome-wide Meta-analysis Illuminates Genetic Architecture of Type 2 Diabetes

10.2337/figshare.17099780.v1 ◽

2021 ◽

Author(s):

Mark J. O’Connor ◽

Philip Schroeder ◽

Alicia Huerta-Chagoya ◽

Paula Cortés-Sánchez ◽

Silvía Bonàs-Guarch ◽

...

Keyword(s):

Type 2 Diabetes ◽

Complex Traits ◽

Association Studies ◽

Meta Analysis ◽

Low Frequency ◽

Additive Models ◽

European Ancestry ◽

Genome Wide Association Studies ◽

Genome Wide

Most genome-wide association studies (GWAS) of complex traits are performed using models with additive allelic effects. Hundreds of loci associated with type 2 diabetes have been identified using this approach. Additive models, however, can miss loci with recessive effects, thereby leaving potentially important genes undiscovered. We conducted the largest GWAS meta-analysis using a recessive model for type 2 diabetes. Our discovery sample included 33,139 cases and 279,507 controls from seven European-ancestry cohorts including the UK Biobank. We identified 51 loci associated with type 2 diabetes, including five variants undetected by prior additive analyses. Two of the five had minor allele frequency less than 5% and were each associated with more than doubled risk in homozygous carriers. Using two additional cohorts, FinnGen and a Danish cohort, we replicated three of the variants, including one of the low-frequency variants, rs115018790, which had an odds ratio in homozygous carriers of 2.56 (95% CI 2.05-3.19, P=1´10-16) and a stronger effect in men than in women (interaction P=7´10-7). The signal was associated with multiple diabetes-related traits, with homozygous carriers showing a 10% decrease in LDL and a 20% increase in triglycerides, and colocalization analysis linked this signal to reduced expression of the nearby PELO gene. These results demonstrate that recessive models, when compared to GWAS using the additive approach, can identify novel loci, including large-effect variants with pathophysiological consequences relevant to type 2 diabetes.

Download Full-text

Using population-specific add-on polymorphisms to improve genotype imputation in underrepresented populations

PLoS Computational Biology ◽

10.1371/journal.pcbi.1009628 ◽

2022 ◽

Vol 18 (1) ◽

pp. e1009628

Author(s):

Zhi Ming Xu ◽

Sina Rüeger ◽

Michaela Zwyer ◽

Daniela Brites ◽

Hellen Hiza ◽

...

Keyword(s):

Association Studies ◽

Imputation Accuracy ◽

Genotype Imputation ◽

Small Subset ◽

Study Cohort ◽

Genome Wide Association Studies ◽

Nucleotide Polymorphisms ◽

Single Nucleotide ◽

Genome Wide ◽

Selection Of

Genome-wide association studies rely on the statistical inference of untyped variants, called imputation, to increase the coverage of genotyping arrays. However, the results are often suboptimal in populations underrepresented in existing reference panels and array designs, since the selected single nucleotide polymorphisms (SNPs) may fail to capture population-specific haplotype structures, hence the full extent of common genetic variation. Here, we propose to sequence the full genomes of a small subset of an underrepresented study cohort to inform the selection of population-specific add-on tag SNPs and to generate an internal population-specific imputation reference panel, such that the remaining array-genotyped cohort could be more accurately imputed. Using a Tanzania-based cohort as a proof-of-concept, we demonstrate the validity of our approach by showing improvements in imputation accuracy after the addition of our designed add-on tags to the base H3Africa array.

Download Full-text