scholarly journals Genotype Imputation Performance of Three Reference Panels Using African Ancestry Individuals

2018 ◽  
Author(s):  
Candelaria Vergara ◽  
Margaret M. Parker ◽  
Liliana Franco ◽  
Michael H. Cho ◽  
Ana V. Valencia-Duarte ◽  
...  

ABSTRACTGenotype imputation is used to estimate unobserved genotypes from genome-wide maker data, to increase genome coverage and power for genome-wide association studies. Imputation has been most successful for European ancestry populations in which very large reference panels are available. Smaller subsets of African descent populations are available in 1000 Genomes (1000G), the Consortium on Asthma among African-Ancestry Populations in the Americas (CAAPA) and the Haplotype Reference Consortium (HRC). We aimed to compare the performance of these reference panels when imputing variation in 3,747 African Americans (AA) from 2 cohorts (HCV and COPDGene) genotyped using the Illumina Omni family of microarrays. The haplotypes of 2,504 individuals (from 1000G), 883 (from CAAPA) and 32,611 (from HRC) were used as reference. We compared the performance of these panels based on number of variants, imputation quality, imputation accuracy and coverage. In both cohorts, 1000G imputed 1.5–1.6x more variants compared to CAAPA and 1.2x more variants than HRC. Similar findings were observed for variants with higher imputation quality (R2>0.5) and for rare, low frequency, and common variants. When merging the results of the three panels the total number of imputed variants was 62M-63M with 20M overlapping variants imputed by all three panels, and a range of 5 to 15M unique variants imputed exclusively with one of the three panels. For overlapping variants, imputation quality was highest for HRC, followed by 1000G, then CAAPA, and improved as the minor allele frequency increased. The 1000G, HRC and CAAPA participants of African ancestry provided high performance and accuracy for imputation of African American admixed individuals, increasing the total number of variants with high quality available for subsequent analyses. These three panels are complementary and would benefit from the development of an integrated African reference panel, including data from multiple sources and populations.

2021 ◽  
Author(s):  
Zhi Ming Xu ◽  
Sina Rüeger ◽  
Michaela Zwyer ◽  
Daniela Brites ◽  
Hellen Hiza ◽  
...  

AbstractGenome-wide association studies rely on the statistical inference of untyped variants, called imputation, to increase the coverage of genotyping arrays. However, the results are often suboptimal in populations underrepresented in existing reference panels and array designs, since the selected single nucleotide polymorphisms (SNPs) may fail to capture population-specific haplotype structures, hence the full extent of common genetic variation. Here, we propose to sequence the full genome of a small subset of an underrepresented study cohort to inform the selection of population-specific add-on SNPs, such that the remaining array-genotyped cohort could be more accurately imputed. Using a Tanzania-based cohort as a proof-of-concept, we demonstrate the validity of our approach by showing improvements in imputation accuracy after the addition of our designed addon SNPs to the base H3Africa array.


2021 ◽  
Author(s):  
Mark J. O’Connor ◽  
Philip Schroeder ◽  
Alicia Huerta-Chagoya ◽  
Paula Cortés-Sánchez ◽  
Silvía Bonàs-Guarch ◽  
...  

Most genome-wide association studies (GWAS) of complex traits are performed using models with additive allelic effects. Hundreds of loci associated with type 2 diabetes have been identified using this approach. Additive models, however, can miss loci with recessive effects, thereby leaving potentially important genes undiscovered. We conducted the largest GWAS meta-analysis using a recessive model for type 2 diabetes. Our discovery sample included 33,139 cases and 279,507 controls from seven European-ancestry cohorts including the UK Biobank. We identified 51 loci associated with type 2 diabetes, including five variants undetected by prior additive analyses. Two of the five had minor allele frequency less than 5% and were each associated with more than doubled risk in homozygous carriers. Using two additional cohorts, FinnGen and a Danish cohort, we replicated three of the variants, including one of the low-frequency variants, rs115018790, which had an odds ratio in homozygous carriers of 2.56 (95% CI 2.05-3.19, <i>P</i>=1´10<sup>-16</sup>) and a stronger effect in men than in women (interaction <i>P</i>=7´10<sup>-7</sup>). The signal was associated with multiple diabetes-related traits, with homozygous carriers showing a 10% decrease in LDL and a 20% increase in triglycerides, and colocalization analysis linked this signal to reduced expression of the nearby <i>PELO</i> gene. These results demonstrate that recessive models, when compared to GWAS using the additive approach, can identify novel loci, including large-effect variants with pathophysiological consequences relevant to type 2 diabetes.


2019 ◽  
Author(s):  
Mart Kals ◽  
Tiit Nikopensius ◽  
Kristi Läll ◽  
Kalle Pärn ◽  
Timo Tõnis Sikka ◽  
...  

AbstractGenotype imputation has become a standard procedure prior genome-wide association studies (GWASs). For common and low-frequency variants, genotype imputation can be performed sufficiently accurately with publicly available and ethnically heterogeneous reference datasets like 1000 Genomes Project (1000G) and Haplotype Reference Consortium panels. However, the imputation of rare variants has been shown to be significantly more accurate when ethnically matched reference panel is used. Even more, greater genetic similarity between reference panel and target samples facilitates the detection of rare (or even population-specific) causal variants. Notwithstanding, the genome-wide downstream consequences and differences of using ethnically mixed and matched reference panels have not been yet comprehensively explored.We determined and quantified these differences by performing several comparative evaluations of the discovery-driven analysis scenarios. A variant-wise GWAS was performed on seven complex diseases and body mass index by using genome-wide genotype data of ∼37,000 Estonians imputed with ethnically mixed 1000G and ethnically matched imputation reference panels. Although several previously reported common (minor allele frequency; MAF > 5%) variant associations were replicated in both resulting imputed datasets, no major differences were observed among the genome-wide significant findings or in the fine-mapping effort. In the analysis of rare (MAF < 1%) coding variants, 46 significantly associated genes were identified in the ethnically matched imputed data as compared to four genes in the 1000G panel based imputed data. All resulting genes were consequently studied in the UK Biobank data.These associations provide a solid example of how rare variants can be efficiently analysed to discover novel, potentially functional genetic variants in relevant phenotypes. Furthermore, our work serves as proof of a cost-efficient study design, demonstrating that the usage of ethnically matched imputation reference panels can enable substantially improved imputation of rare variants, facilitating novel high-confidence findings in rare variant GWAS scans.Author summaryOver the last decade, genome-wide association studies (GWASs) have been widely used for detecting genetic biomarkers in a wide range of traits. Typically, GWASs are carried out using chip-based genotyping data, which are then combined with a more densely genotyped reference panel to infer untyped genetic variants in chip-typed individuals. The latter method is called genotype imputation and its accuracy depends on multiple factors. Publicly available and ethnically heterogeneous imputation reference panels (IRPs) such as 1000 Genomes Project (1000G) are sufficiently accurate for imputation of common and low-frequency variants, but custom ethnically matched IRPs outperform these in case of rare variants. In this work, we systematically compare downstream association analysis effects on eight complex traits in ∼37,000 Estonians imputed with ethnically mixed and ethnically matched IRPs. We do not observe major differences in the single variant analysis, where both imputed datasets replicate previously reported significant loci. But in the gene-based analysis of rare protein-coding variants we show that ethnically matched panel clearly outperforms 1000G panel based imputation, providing 10-fold increase in significant gene-trait associations. Our study demonstrates empirically that imputed data based on ethnically matched panel is very promising for rare variant analysis – it captures more population-specific variants and makes it possible to efficiently identify novel findings.


2018 ◽  
Author(s):  
Caroline M. Nievergelt ◽  
Adam X. Maihofer ◽  
Torsten Klengel ◽  
Elizabeth G. Atkinson ◽  
Chia-Yen Chen ◽  
...  

AbstractPost-traumatic stress disorder (PTSD) is a common and debilitating disorder. The risk of PTSD following trauma is heritable, but robust common variants have yet to be identified by genome-wide association studies (GWAS). We have collected a multi-ethnic cohort including over 30,000 PTSD cases and 170,000 controls. We first demonstrate significant genetic correlations across 60 PTSD cohorts to evaluate the comparability of these phenotypically heterogeneous studies. In this largest GWAS meta-analysis of PTSD to date we identify a total of 6 genome-wide significant loci, 4 in European and 2 in African-ancestry analyses. Follow-up analyses incorporated local ancestry and sex-specific effects, and functional studies. Along with other novel genes, a non-coding RNA (ncRNA) and a Parkinson’s Disease gene,PARK2, were associated with PTSD. Consistent with previous reports, SNP-based heritability estimates for PTSD range between 10-20%. Despite a significant shared liability between PTSD and major depressive disorder, we show evidence that some of our loci may be specific to PTSD. These results demonstrate the role of genetic variation contributing to the biology of differential risk for PTSD and the necessity of expanding GWAS beyond European ancestry.


Author(s):  
Ying Wang ◽  
Jing Guo ◽  
Guiyan Ni ◽  
Jian Yang ◽  
Peter M. Visscher ◽  
...  

AbstractPolygenic scores (PGS) have been widely used to predict complex traits and risk of diseases using variants identified from genome-wide association studies (GWASs). To date, most GWASs have been conducted in populations of European ancestry, which limits the use of GWAS-derived PGS in non-European populations. Here, we develop a new theory to predict the relative accuracy (RA, relative to the accuracy in populations of the same ancestry as the discovery population) of PGS across ancestries. We used simulations and real data from the UK Biobank to evaluate our results. We found across various simulation scenarios that the RA of PGS based on trait-associated SNPs can be predicted accurately from modelling linkage disequilibrium (LD), minor allele frequencies (MAF), cross-population correlations of SNP effect sizes and heritability. Altogether, we find that LD and MAF differences between ancestries explain alone up to ~70% of the loss of RA using European-based PGS in African ancestry for traits like body mass index and height. Our results suggest that causal variants underlying common genetic variation identified in European ancestry GWASs are mostly shared across continents.


2020 ◽  
Vol 4 (1) ◽  
pp. 181-190 ◽  
Author(s):  
Zhaohui Du ◽  
Niels Weinhold ◽  
Gregory Chi Song ◽  
Kristin A. Rand ◽  
David J. Van Den Berg ◽  
...  

Abstract Persons of African ancestry (AA) have a twofold higher risk for multiple myeloma (MM) compared with persons of European ancestry (EA). Genome-wide association studies (GWASs) support a genetic contribution to MM etiology in individuals of EA. Little is known about genetic risk factors for MM in individuals of AA. We performed a meta-analysis of 2 GWASs of MM in 1813 cases and 8871 controls and conducted an admixture mapping scan to identify risk alleles. We fine-mapped the 23 known susceptibility loci to find markers that could better capture MM risk in individuals of AA and constructed a polygenic risk score (PRS) to assess the aggregated effect of known MM risk alleles. In GWAS meta-analysis, we identified 2 suggestive novel loci located at 9p24.3 and 9p13.1 at P &lt; 1 × 10−6; however, no genome-wide significant association was noted. In admixture mapping, we observed a genome-wide significant inverse association between local AA at 2p24.1-23.1 and MM risk in AA individuals. Of the 23 known EA risk variants, 20 showed directional consistency, and 9 replicated at P &lt; .05 in AA individuals. In 8 regions, we identified markers that better capture MM risk in persons with AA. AA individuals with a PRS in the top 10% had a 1.82-fold (95% confidence interval, 1.56-2.11) increased MM risk compared with those with average risk (25%-75%). The strongest functional association was between the risk allele for variant rs56219066 at 5q15 and lower ELL2 expression (P = 5.1 × 10−12). Our study shows that common genetic variation contributes to MM risk in individuals with AA.


2017 ◽  
Author(s):  
Sina Rüeger ◽  
Aaron McDaid ◽  
Zoltán Kutalik

AbstractAs most of the heritability of complex traits is attributed to common and low frequency genetic variants, imputing them by combining genotyping chips and large sequenced reference panels is the most cost-effective approach to discover the genetic basis of these traits. Association summary statistics from genome-wide meta-analyses are available for hundreds of traits. Updating these to ever-increasing reference panels is very cumbersome as it requires reimputation of the genetic data, rerunning the association scan, and meta-analysing the results. A much more efficient method is to directly impute the summary statistics, termed as summary statistics imputation. Its performance relative to genotype imputation and practical utility has not yet been fully investigated. To this end, we compared the two approaches on real (genotyped and imputed) data from 120K samples from the UK Biobank and show that, while genotype imputation boasts a 2- to 5-fold lower root-mean-square error, summary statistics imputation better distinguishes true associations from null ones: We observed the largest differences in power for variants with low minor allele frequency and low imputation quality. For fixed false positive rates of 0.001, 0.01, 0.05, using summary statistics imputation yielded an increase in statistical power by 15, 10 and 3%, respectively. To test its capacity to discover novel associations, we applied summary statistics imputation to the GIANT height meta-analysis summary statistics covering HapMap variants, and identified 34 novel loci, 19 of which replicated using data in the UK Biobank. Additionally, we successfully replicated 55 out of the 111 variants published in an exome chip study. Our study demonstrates that summary statistics imputation is a very efficient and cost-effective way to identify and fine-map trait-associated loci. Moreover, the ability to impute summary statistics is important for follow-up analyses, such as Mendelian randomisation or LD-score regression.Author summaryGenome-wide association studies (GWASs) quantify the effect of genetic variants and traits, such as height. Such estimates are called association summary statistics and are typically publicly shared through publication. Typically, GWASs are carried out by genotyping ~ 500′000 SNVs for each individual which are then combined with sequenced reference panels to infer untyped SNVs in each’ individuals genome. This process of genotype imputation is resource intensive and can therefore be a limitation when combining many GWASs. An alternative approach is to bypass the use of individual data and directly impute summary statistics. In our work we compare the performance of summary statistics imputation to genotype imputation. Although we observe a 2- to 5-fold lower RMSE for genotype imputation compared to summary statistics imputation, summary statistics imputation better distinguishes true associations from null results. Furthermore, we demonstrate the potential of summary statistics imputation by presenting 34 novel height-associated loci, 19 of which were confirmed in UK Biobank. Our study demonstrates that given current reference panels, summary statistics imputation is a very efficient and cost-effective way to identify common or low-frequency trait-associated loci.


2019 ◽  
Vol 10 (1) ◽  
Author(s):  
Caroline M. Nievergelt ◽  
Adam X. Maihofer ◽  
Torsten Klengel ◽  
Elizabeth G. Atkinson ◽  
Chia-Yen Chen ◽  
...  

Abstract The risk of posttraumatic stress disorder (PTSD) following trauma is heritable, but robust common variants have yet to be identified. In a multi-ethnic cohort including over 30,000 PTSD cases and 170,000 controls we conduct a genome-wide association study of PTSD. We demonstrate SNP-based heritability estimates of 5–20%, varying by sex. Three genome-wide significant loci are identified, 2 in European and 1 in African-ancestry analyses. Analyses stratified by sex implicate 3 additional loci in men. Along with other novel genes and non-coding RNAs, a Parkinson’s disease gene involved in dopamine regulation, PARK2, is associated with PTSD. Finally, we demonstrate that polygenic risk for PTSD is significantly predictive of re-experiencing symptoms in the Million Veteran Program dataset, although specific loci did not replicate. These results demonstrate the role of genetic variation in the biology of risk for PTSD and highlight the necessity of conducting sex-stratified analyses and expanding GWAS beyond European ancestry populations.


2021 ◽  
Author(s):  
Mark J. O’Connor ◽  
Philip Schroeder ◽  
Alicia Huerta-Chagoya ◽  
Paula Cortés-Sánchez ◽  
Silvía Bonàs-Guarch ◽  
...  

Most genome-wide association studies (GWAS) of complex traits are performed using models with additive allelic effects. Hundreds of loci associated with type 2 diabetes have been identified using this approach. Additive models, however, can miss loci with recessive effects, thereby leaving potentially important genes undiscovered. We conducted the largest GWAS meta-analysis using a recessive model for type 2 diabetes. Our discovery sample included 33,139 cases and 279,507 controls from seven European-ancestry cohorts including the UK Biobank. We identified 51 loci associated with type 2 diabetes, including five variants undetected by prior additive analyses. Two of the five had minor allele frequency less than 5% and were each associated with more than doubled risk in homozygous carriers. Using two additional cohorts, FinnGen and a Danish cohort, we replicated three of the variants, including one of the low-frequency variants, rs115018790, which had an odds ratio in homozygous carriers of 2.56 (95% CI 2.05-3.19, <i>P</i>=1´10<sup>-16</sup>) and a stronger effect in men than in women (interaction <i>P</i>=7´10<sup>-7</sup>). The signal was associated with multiple diabetes-related traits, with homozygous carriers showing a 10% decrease in LDL and a 20% increase in triglycerides, and colocalization analysis linked this signal to reduced expression of the nearby <i>PELO</i> gene. These results demonstrate that recessive models, when compared to GWAS using the additive approach, can identify novel loci, including large-effect variants with pathophysiological consequences relevant to type 2 diabetes.


2022 ◽  
Vol 18 (1) ◽  
pp. e1009628
Author(s):  
Zhi Ming Xu ◽  
Sina Rüeger ◽  
Michaela Zwyer ◽  
Daniela Brites ◽  
Hellen Hiza ◽  
...  

Genome-wide association studies rely on the statistical inference of untyped variants, called imputation, to increase the coverage of genotyping arrays. However, the results are often suboptimal in populations underrepresented in existing reference panels and array designs, since the selected single nucleotide polymorphisms (SNPs) may fail to capture population-specific haplotype structures, hence the full extent of common genetic variation. Here, we propose to sequence the full genomes of a small subset of an underrepresented study cohort to inform the selection of population-specific add-on tag SNPs and to generate an internal population-specific imputation reference panel, such that the remaining array-genotyped cohort could be more accurately imputed. Using a Tanzania-based cohort as a proof-of-concept, we demonstrate the validity of our approach by showing improvements in imputation accuracy after the addition of our designed add-on tags to the base H3Africa array.


Sign in / Sign up

Export Citation Format

Share Document