Signals of polygenic adaptation on height have been overestimated due to uncorrected population structure in genome-wide association studies

Mapping Intimacies ◽

10.1101/355057 ◽

2018 ◽

Cited By ~ 19

Author(s):

Mashaal Sohail ◽

Robert M. Maier ◽

Andrea Ganna ◽

Alex Bloemendal ◽

Alicia R. Martin ◽

...

Keyword(s):

Population Structure ◽

Association Studies ◽

Meta Analysis ◽

Human Populations ◽

Genome Wide Association Studies ◽

Multiple Traits ◽

Large Numbers ◽

Genome Wide ◽

Polygenic Adaptation ◽

The Uk

AbstractGenetic predictions of height differ among human populations and these differences are too large to be explained by genetic drift. This observation has been interpreted as evidence of polygenic adaptation. Differences across populations were detected using SNPs genome-wide significantly associated with height, and many studies also found that the signals grew stronger when large numbers of subsignificant SNPs were analyzed. This has led to excitement about the prospect of analyzing large fractions of the genome to detect subtle signals of selection and claims of polygenic adaptation for multiple traits. Polygenic adaptation studies of height have been based on SNP effect size measurements in the GIANT Consortium meta-analysis. Here we repeat the height analyses in the UK Biobank, a much more homogeneously designed study. Our results show that polygenic adaptation signals based on large numbers of SNPs below genome-wide significance are extremely sensitive to biases due to uncorrected population structure.

Download Full-text

Polygenic adaptation on height is overestimated due to uncorrected stratification in genome-wide association studies

eLife ◽

10.7554/elife.39702 ◽

2019 ◽

Vol 8 ◽

Cited By ~ 90

Author(s):

Mashaal Sohail ◽

Robert M Maier ◽

Andrea Ganna ◽

Alex Bloemendal ◽

Alicia R Martin ◽

...

Keyword(s):

Population Stratification ◽

Association Studies ◽

Editorial Note ◽

Human Populations ◽

Genome Wide Association Studies ◽

Multiple Traits ◽

Large Numbers ◽

Genome Wide ◽

Polygenic Scores ◽

Polygenic Adaptation

Genetic predictions of height differ among human populations and these differences have been interpreted as evidence of polygenic adaptation. These differences were first detected using SNPs genome-wide significantly associated with height, and shown to grow stronger when large numbers of sub-significant SNPs were included, leading to excitement about the prospect of analyzing large fractions of the genome to detect polygenic adaptation for multiple traits. Previous studies of height have been based on SNP effect size measurements in the GIANT Consortium meta-analysis. Here we repeat the analyses in the UK Biobank, a much more homogeneously designed study. We show that polygenic adaptation signals based on large numbers of SNPs below genome-wide significance are extremely sensitive to biases due to uncorrected population stratification. More generally, our results imply that typical constructions of polygenic scores are sensitive to population stratification and that population-level differences should be interpreted with caution.Editorial note: This article has been through an editorial process in which the authors decide how to respond to the issues raised during peer review. The Reviewing Editor's assessment is that all the issues have been addressed (<xref ref-type="decision-letter" rid="SA1">see decision letter</xref>).

Download Full-text

False discovery rate control in genome-wide association studies with population structure

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.2105841118 ◽

2021 ◽

Vol 118 (40) ◽

pp. e2105841118

Author(s):

Matteo Sesia ◽

Stephen Bates ◽

Emmanuel Candès ◽

Jonathan Marchini ◽

Chiara Sabatti

Keyword(s):

Population Structure ◽

False Discovery Rate ◽

Association Studies ◽

Genome Wide Association ◽

Human Populations ◽

Genome Wide Association Studies ◽

False Discovery ◽

Genome Wide ◽

The Uk ◽

Negative Controls

We present a comprehensive statistical framework to analyze data from genome-wide association studies of polygenic traits, producing interpretable findings while controlling the false discovery rate. In contrast with standard approaches, our method can leverage sophisticated multivariate algorithms but makes no parametric assumptions about the unknown relation between genotypes and phenotype. Instead, we recognize that genotypes can be considered as a random sample from an appropriate model, encapsulating our knowledge of genetic inheritance and human populations. This allows the generation of imperfect copies (knockoffs) of these variables that serve as ideal negative controls, correcting for linkage disequilibrium and accounting for unknown population structure, which may be due to diverse ancestries or familial relatedness. The validity and effectiveness of our method are demonstrated by extensive simulations and by applications to the UK Biobank data. These analyses confirm our method is powerful relative to state-of-the-art alternatives, while comparisons with other studies validate most of our discoveries. Finally, fast software is made available for researchers to analyze Biobank-scale datasets.

Download Full-text

Genome-Wide Meta-Analysis of Late-Onset Alzheimer's Disease Using Rare Variant Imputation in 324,809 Subjects Identifies Novel Rare Variant Locus NCK2: The International Genomics of Alzheimer's Project (IGAP)

10.1101/2021.03.14.21253553 ◽

2021 ◽

Author(s):

Adam C. Naj ◽

Ganna Leonenko ◽

Xueqiu Jian ◽

Benjamin Grenier-Boley ◽

Maria Carolina Dalmasso ◽

...

Keyword(s):

Alzheimer’S Disease ◽

Alzheimer's Disease ◽

Rare Variant ◽

Late Onset ◽

Sequence Data ◽

Association Studies ◽

Meta Analysis ◽

Genome Wide Association Studies ◽

Genome Wide ◽

The Uk

Risk for late-onset Alzheimer's disease (LOAD) is driven by multiple loci primarily identified by genome-wide association studies, many of which are common variants with minor allele frequencies (MAF)>0.01. To identify additional common and rare LOAD risk variants, we performed a GWAS on 25,170 LOAD subjects and 41,052 cognitively normal controls in 44 datasets from the International Genomics of Alzheimer's Project (IGAP). Existing genotype data were imputed using the dense, high-resolution Haplotype Reference Consortium (HRC) r1.1 reference panel. Stage 1 associations of P<10-5 were meta-analyzed with the European Alzheimer's Disease Biobank (EADB) (n=20,301 cases; 21,839 controls) (stage 2 combined IGAP and EADB). An expanded meta-analysis was performed using a GWAS of parental AD/dementia history in the UK Biobank (UKBB) (n=35,214 cases; 180,791 controls) (stage 3 combined IGAP, EADB, and UKBB). Common variant (MAF≥0.01) associations were identified for 29 loci in stage 2, including novel genome-wide significant associations at TSPAN14 (P=2.33×10-12), SHARPIN (P=1.56×10-9), and ATF5/SIGLEC11 (P=1.03[mult]10-8), and newly significant associations without using AD proxy cases in MTSS1L/IL34 (P=1.80×10-8), APH1B (P=2.10×10-13), and CLNK (P=2.24×10-10). Rare variant (MAF<0.01) associations with genome-wide significance in stage 2 included multiple variants in APOE and TREM2, and a novel association of a rare variant (rs143080277; MAF=0.0054; P=2.69×10-9) in NCK2, further strengthened with the inclusion of UKBB data in stage 3 (P=7.17×10-13). Single-nucleus sequence data shows that NCK2 is highly expressed in amyloid-responsive microglial cells, suggesting a role in LOAD pathology.

Download Full-text

A meta-analysis of the genome-wide association studies on two genetically correlated phenotypes (self-reported headache and self-reported migraine) identifies four new risk loci for headaches (N=397,385)

10.1101/2021.09.15.21263668 ◽

2021 ◽

Author(s):

Weihua Meng ◽

Parminder Reel ◽

Charvi Nangia ◽

Aravind Rajendrakumar ◽

Harry Hebert ◽

...

Keyword(s):

Association Studies ◽

Meta Analysis ◽

The Self ◽

Genome Wide Association ◽

P Value ◽

Clinical Settings ◽

Genome Wide Association Studies ◽

Genome Wide ◽

Genetic Mechanisms ◽

The Uk

Headache is one of the commonest complaints that doctors need to address in clinical settings. The genetic mechanisms of different types of headache are not well understood. In this study, we performed a meta-analysis of genome-wide association studies (GWAS) on the self-reported headache phenotype from the UK Biobank cohort and the self-reported migraine phenotype from the 23andMe resource using the metaUSAT for genetically correlated phenotypes (N=397,385). We identified 38 loci for headaches, of which 34 loci have been reported before and 4 loci were newly identified. The LRP1-STAT6-SDR9C7 region in chromosome 12 was the most significantly associated locus with a leading P value of 1.24 x 10-62 of rs11172113. The ONECUT2 gene locus in chromosome 18 was the strongest signal among the 4 new loci with a P value of 1.29 x 10-9 of rs673939. Our study demonstrated that the genetically correlated phenotypes of self-reported headache and self-reported migraine can be meta-analysed together in theory and in practice to boost study power to identify more new variants for headaches. This study has paved way for a large GWAS meta-analysis study involving cohorts of different, though genetically correlated headache phenotypes.

Download Full-text

Fine-scale population structure in the UK Biobank: implications for genome-wide association studies

Human Molecular Genetics ◽

10.1093/hmg/ddaa157 ◽

2020 ◽

Vol 29 (16) ◽

pp. 2803-2811

Author(s):

James P Cook ◽

Anubha Mahajan ◽

Andrew P Morris

Keyword(s):

Population Structure ◽

Association Studies ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Fine Scale ◽

Uk Biobank ◽

Genome Wide ◽

Scale Population ◽

The Uk ◽

The Impact

Abstract The UK Biobank is a prospective study of more than 500 000 participants, which has aggregated data from questionnaires, physical measures, biomarkers, imaging and follow-up for a wide range of health-related outcomes, together with genome-wide genotyping supplemented with high-density imputation. Previous studies have highlighted fine-scale population structure in the UK on a North-West to South-East cline, but the impact of unmeasured geographical confounding on genome-wide association studies (GWAS) of complex human traits in the UK Biobank has not been investigated. We considered 368 325 white British individuals from the UK Biobank and performed GWAS of their birth location. We demonstrate that widely used approaches to adjust for population structure, including principal component analysis and mixed modelling with a random effect for a genetic relationship matrix, cannot fully account for the fine-scale geographical confounding in the UK Biobank. We observe significant genetic correlation of birth location with a range of lifestyle-related traits, including body-mass index and fat mass, hypertension and lung function, even after adjustment for population structure. Variants driving associations with birth location are also strongly associated with many of these lifestyle-related traits after correction for population structure, indicating that there could be environmental factors that are confounded with geography that have not been adequately accounted for. Our findings highlight the need for caution in the interpretation of lifestyle-related trait GWAS in UK Biobank, particularly in loci demonstrating strong residual association with birth location.

Download Full-text

Polygenic Adaptation has Impacted Multiple Anthropometric Traits

10.1101/167551 ◽

2017 ◽

Cited By ~ 30

Author(s):

Jeremy J. Berg ◽

Xinjun Zhang ◽

Graham Coop

Keyword(s):

Complex Traits ◽

Association Studies ◽

Gwas Data ◽

Human Populations ◽

Genome Wide Association Studies ◽

Uk Biobank ◽

Link Type ◽

Polygenic Scores ◽

Polygenic Adaptation ◽

The Uk

AbstractOur understanding of the genetic basis of human adaptation is biased toward loci of large pheno-typic effect. Genome wide association studies (GWAS) now enable the study of genetic adaptation in polygenic phenotypes. We test for polygenic adaptation among 187 world-wide human populations using polygenic scores constructed from GWAS of 34 complex traits. We identify signals of polygenic adaptation for anthropometric traits including height, infant head circumference (IHC), hip circumference and waist-to-hip ratio (WHR). Analysis of ancient DNA samples indicates that a north-south cline of height within Europe and and a west-east cline across Eurasia can be traced to selection for increased height in two late Pleistocene hunter gatherer populations living in western and west-central Eurasia. Our observation that IHC and WHR follow a latitudinal cline in Western Eurasia support the role of natural selection driving Bergmann’s Rule in humans, consistent with thermoregulatory adaptation in response to latitudinal temperature variation.Author’s Note on Failure to ReplicateAfter this preprint was posted, the UK Biobank dataset was released, providing a new and open GWAS resource. When attempting to replicate the height selection results from this preprint using GWAS data from the UK Biobank, we discovered that we could not. In subsequent analyses, we determined that both the GIANT consortium height GWAS data, as well as another dataset that was used for replication, were impacted by stratification issues that created or at a minimum substantially inflated the height selection signals reported here. The results of this second investigation, written together with additional coauthors, have now been published (https://elifesciences.org/articles/39725 along with another paper by a separate group of authors, showing similar issues https://elifesciences.org/articles/39702). A preliminary investigation shows that the other non-height based results may suffer from similar issues. We stand by the theory and statistical methods reported in this paper, and the paper can be cited for these results. However, we have shown that the data on which the major empirical results were based are not sound, and so should be treated with caution until replicated.

Download Full-text

A simple test identifies selection on complex traits in breeding and experimentally-evolved populations

10.1101/238295 ◽

2017 ◽

Author(s):

Tim Beissinger ◽

Jochen Kruppa ◽

David Cavero ◽

Ngoc-Thuy Ha ◽

Malena Erbe ◽

...

Keyword(s):

Complex Traits ◽

Association Studies ◽

Human Populations ◽

Frequency Change ◽

Genome Wide Association Studies ◽

Breeding Populations ◽

Large Numbers ◽

Genome Wide ◽

Single Time Point ◽

Time Point

AbstractImportant traits in agricultural, natural, and human populations are increasingly being shown to be under the control of many genes that individually contribute only a small proportion of genetic variation. However, the majority of modern tools in quantitative and population genetics, including genome wide association studies and selection mapping protocols, are designed to identify individual genes with large effects. We have developed an approach to identify traits that have been under selection and are controlled by large numbers of loci. In contrast to existing methods, our technique utilizes additive effects estimates from all available markers, and relates these estimates to allele frequency change over time. Using this information, we generate a composite statistic, denoted Ĝ, which can be used to test for significant evidence of selection on a trait. Our test requires pre- and post-selection genotypic data but only a single time point with phenotypic information. Simulations demonstrate that Ĝ is powerful for identifying selection, particularly in situations where the trait being tested is controlled by many genes, which is precisely the scenario where classical approaches for selection mapping are least powerful. We apply this test to breeding populations of maize and chickens, where we demonstrate the successful identification of selection on traits that are documented to have been under selection.

Download Full-text

Summary statistics knockoff inference empowers identification of putative causal variants in genome-wide association studies

10.1101/2021.12.06.471440 ◽

2021 ◽

Author(s):

Zihuai He ◽

Linxi Liu ◽

Michael E. Belloy ◽

Yann Le Guen ◽

Aaron Sossin ◽

...

Keyword(s):

Genome Sequencing ◽

Association Studies ◽

Meta Analysis ◽

Genome Wide Association ◽

Superior Performance ◽

Genome Wide Association Studies ◽

Uk Biobank ◽

Genome Wide ◽

Causal Variants ◽

The Uk

AbstractRecent advances in genome sequencing and imputation technologies provide an exciting opportunity to comprehensively study the contribution of genetic variants to complex phenotypes. However, our ability to translate genetic discoveries into mechanistic insights remains limited at this point. In this paper, we propose an efficient knockoff-based method, GhostKnockoff, for genome-wide association studies (GWAS) that leads to improved power and ability to prioritize putative causal variants relative to conventional GWAS approaches. The method requires only Z-scores from conventional GWAS and hence can be easily applied to enhance existing and future studies. The method can also be applied to meta-analysis of multiple GWAS allowing for arbitrary sample overlap. We demonstrate its performance using empirical simulations and two applications: (1) analysis of 1,403 binary phenotypes from the UK Biobank data in 408,961 samples of European ancestry, and (2) a meta-analysis for Alzheimer’s disease (AD) comprising nine overlapping large-scale GWAS, whole-exome and whole-genome sequencing studies. The UK Biobank analysis demonstrates superior performance of the proposed method compared to conventional GWAS in both statistical power (2.05-fold more discoveries) and localization of putative causal variants at each locus (46% less proxy variants due to linkage disequilibrium). The AD meta-analysis identified 55 risk loci (including 31 new loci) with ~70% of the proximal genes at these loci showing suggestive signal in downstream single-cell transcriptomic analyses. Our results demonstrate that GhostKnockoff can identify putatively functional variants with weaker statistical effects that are missed by conventional association tests.

Download Full-text

Medical data and machine learning improve power of stroke genome-wide association studies

10.1101/2020.01.22.915397 ◽

2020 ◽

Author(s):

Phyllis M. Thangaraj ◽

Undina Gisladottir ◽

Nicholas P. Tatonetti

Keyword(s):

Machine Learning ◽

Large Scale ◽

Association Studies ◽

Meta Analysis ◽

Clinical Care ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Genome Wide ◽

Care Systems ◽

The Uk

AbstractGenome-wide association studies (GWAS) may require enrollment of up to millions of participants to power variant discovery. This requires manual curation of cases and controls with large-scale collaborations. Biobanks connected to electronic health records (EHR) can facilitate these studies by using data from clinical care systems, like billing diagnosis codes, as phenotypes. These systems, however, do not define adjudicated cases and controls. We developed QTPhenProxy, a machine learning model that adds nuance to cohort classification by assigning everyone in a cohort a probability of having the study disease. We then ran a GWAS using the probabilities as a quantitative trait. With an order of magnitude fewer cases than the largest stroke GWAS, our method outperformed previous methods at replicating known variants in stroke and discovered a novel variant in ABCG8 associated with intracerebral hemorrhage in the UK Biobank that replicated in the MEGASTROKE GWA meta-analysis. QTPhenProxy expands traditional phenotyping to improve the power of GWAS.

Download Full-text