Is it reasonable to account for population structure in genome-wide association studies?

Mapping Intimacies ◽

10.1101/647768 ◽

2019 ◽

Author(s):

Bongsong Kim

Keyword(s):

Population Structure ◽

Linear Model ◽

Association Studies ◽

Significant Snps ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Continuous Variables ◽

Genome Wide ◽

Manhattan Plot

AbstractPopulation structure is widely perceived as a noise factor that undermines the quality of association between an SNP variable and a phenotypic variable in genome-wide association studies (GWAS). The linear model for GWAS generally accounts for population-structure variables to obtain the adjusted phenotype which has less noise. Its result is known to amplify the contrast between significant SNPs and insignificant SNPs in a resultant Manhattan plot. In fact, however, conventional GWAS practice often implements the linear model in an unusual way in that the population-structure variables are incorporated into the linear model in the form of continuous variables rather than factor variables. If the coefficients for population-structure variables change across all SNPs, then each SNP variable will be regressed against a differently adjusted phenotypic variable, making the GWAS process unreliable. Focusing on this concern, this study investigated whether accounting for population-structure variables in the linear model for GWAS can assure the adjusted phenotypes to be consistent across all SNPs. The result showed that the adjusted phenotypes resulting across all SNPs were not consistent, which is alarming considering conventional GWAS practice that accounts for population structure.

Download Full-text

Genome-wide association studies of yield-related traits in high-latitude japonica rice

BMC Genomic Data ◽

10.1186/s12863-021-00995-y ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Guomin Zhang ◽

Rongsheng Wang ◽

Juntao Ma ◽

Hongru Gao ◽

Lingwei Deng ◽

...

Keyword(s):

Linear Model ◽

High Latitude ◽

Association Studies ◽

Genome Wide Association ◽

Japonica Rice ◽

Genome Wide Association Studies ◽

Heilongjiang Province ◽

Nucleotide Polymorphisms ◽

Coding Region ◽

Genome Wide

Abstract Background Heilongjiang Province is a high-quality japonica rice cultivation area in China. One in ten bowls of Chinese rice is produced here. Increasing yield is one of the main aims of rice production in this area. However, yield is a complex quantitative trait composed of many factors. The purpose of this study was to determine how many genetic loci are associated with yield-related traits. Genome-wide association studies (GWAS) were performed on 450 accessions collected from northeast Asia, including Russia, Korea, Japan and Heilongjiang Province of China. These accessions consist of elite varieties and landraces introduced into Heilongjiang Province decade ago. Results After resequencing of the 450 accessions, 189,019 single nucleotide polymorphisms (SNPs) were used for association studies by two different models, a general linear model (GLM) and a mixed linear model (MLM), examining four traits: days to heading (DH), plant height (PH), panicle weight (PW) and tiller number (TI). Over 25 SNPs were found to be associated with each trait. Among them, 22 SNPs were selected to identify candidate genes, and 2, 8, 1 and 11 SNPs were found to be located in 3′ UTR region, intron region, coding region and intergenic region, respectively. Conclusions All SNPs detected in this research may become candidates for further fine mapping and may be used in the molecular breeding of high-latitude rice.

Download Full-text

Effects of Population Structure in Genome-wide Association Studies

Analysis of Complex Disease Association Studies ◽

10.1016/b978-0-12-375142-3.10009-4 ◽

2011 ◽

pp. 123-156 ◽

Cited By ~ 1

Author(s):

Yurii S. Aulchenko

Keyword(s):

Population Structure ◽

Association Studies ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Genome Wide

Download Full-text

Impacts of Population Structure and Analytical Models in Genome-Wide Association Studies of Complex Traits in Forest Trees: A Case Study in Eucalyptus globulus

PLoS ONE ◽

10.1371/journal.pone.0081267 ◽

2013 ◽

Vol 8 (11) ◽

pp. e81267 ◽

Cited By ~ 40

Author(s):

Eduardo P. Cappa ◽

Yousry A. El-Kassaby ◽

Martín N. Garcia ◽

Cintia Acuña ◽

Nuno M. G. Borralho ◽

...

Keyword(s):

Population Structure ◽

Complex Traits ◽

Eucalyptus Globulus ◽

Association Studies ◽

Genome Wide Association ◽

Analytical Models ◽

Genome Wide Association Studies ◽

Forest Trees ◽

Genome Wide

Download Full-text

Mixed linear model approach adapted for genome-wide association studies

Nature Genetics ◽

10.1038/ng.546 ◽

2010 ◽

Vol 42 (4) ◽

pp. 355-360 ◽

Cited By ~ 1007

Author(s):

Zhiwu Zhang ◽

Elhan Ersoz ◽

Chao-Qiang Lai ◽

Rory J Todhunter ◽

Hemant K Tiwari ◽

...

Keyword(s):

Linear Model ◽

Association Studies ◽

Mixed Linear Model ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Genome Wide ◽

Model Approach

Download Full-text

The use of common mitochondrial variants to detect and characterise population structure in the Australian population: implications for genome-wide association studies

European Journal of Human Genetics ◽

10.1038/ejhg.2008.117 ◽

2008 ◽

Vol 16 (11) ◽

pp. 1396-1403 ◽

Cited By ~ 6

Author(s):

Enda M Byrne ◽

Allan F McRae ◽

Zhen-Zhen Zhao ◽

Nicholas G Martin ◽

Grant W Montgomery ◽

...

Keyword(s):

Population Structure ◽

Association Studies ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Australian Population ◽

Genome Wide

Download Full-text

A multi-marker association method for genome-wide association studies without the need for population structure correction

Nature Communications ◽

10.1038/ncomms13299 ◽

2016 ◽

Vol 7 (1) ◽

Cited By ~ 20

Author(s):

Jonas R. Klasen ◽

Elke Barbez ◽

Lukas Meier ◽

Nicolai Meinshausen ◽

Peter Bühlmann ◽

...

Keyword(s):

Population Structure ◽

Association Studies ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Genome Wide ◽

Association Method

Download Full-text

Common statistical issues in genome-wide association studies: a review on power, data quality control, genotype calling and population structure

Current Opinion in Lipidology ◽

10.1097/mol.0b013e3282f5dd77 ◽

2008 ◽

Vol 19 (2) ◽

pp. 133-143 ◽

Cited By ~ 57

Author(s):

Yik Y Teo

Keyword(s):

Quality Control ◽

Population Structure ◽

Association Studies ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Data Quality Control ◽

Genotype Calling ◽

Genome Wide ◽

Statistical Issues ◽

Control Genotype

Download Full-text

Variable Selection in Heterogeneous Datasets: A Truncated-rank Sparse Linear Mixed Model with Applications to Genome-wide Association Studies

10.1101/228106 ◽

2017 ◽

Cited By ~ 2

Author(s):

Haohan Wang ◽

Bryon Aragam ◽

Eric P. Xing

Keyword(s):

Population Structure ◽

Variable Selection ◽

Mixed Model ◽

Linear Mixed Model ◽

Association Studies ◽

Genome Wide Association ◽

Low Rank ◽

Genome Wide Association Studies ◽

Unified Framework ◽

Genome Wide

AbstractA fundamental and important challenge in modern datasets of ever increasing dimensionality is variable selection, which has taken on renewed interest recently due to the growth of biological and medical datasets with complex, non-i.i.d. structures. Naïvely applying classical variable selection methods such as the Lasso to such datasets may lead to a large number of false discoveries. Motivated by genome-wide association studies in genetics, we study the problem of variable selection for datasets arising from multiple subpopulations, when this underlying population structure is unknown to the researcher. We propose a unified framework for sparse variable selection that adaptively corrects for population structure via a low-rank linear mixed model. Most importantly, the proposed method does not require prior knowledge of sample structure in the data and adaptively selects a covariance structure of the correct complexity. Through extensive experiments, we illustrate the effectiveness of this framework over existing methods. Further, we test our method on three different genomic datasets from plants, mice, and human, and discuss the knowledge we discover with our method.

Download Full-text

Multiplex Confounding Factor Correction for Genomic Association Mapping with Squared Sparse Linear Mixed Model

10.1101/228114 ◽

2017 ◽

Author(s):

Haohan Wang ◽

Xiang Liu ◽

Yunpeng Xiao ◽

Ming Xu ◽

Eric P. Xing

Keyword(s):

Population Structure ◽

Association Mapping ◽

Complex Traits ◽

Association Studies ◽

Phenotypic Variability ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Confounding Factors ◽

Genetic Loci ◽

Genome Wide

AbstractGenome-wide Association Study has presented a promising way to understand the association between human genomes and complex traits. Many simple polymorphic loci have been shown to explain a significant fraction of phenotypic variability. However, challenges remain in the non-triviality of explaining complex traits associated with multifactorial genetic loci, especially considering the confounding factors caused by population structure, family structure, and cryptic relatedness. In this paper, we propose a Squared-LMM (LMM2) model, aiming to jointly correct population and genetic confounding factors. We offer two strategies of utilizing LMM2 for association mapping: 1) It serves as an extension of univariate LMM, which could effectively correct population structure, but consider each SNP in isolation. 2) It is integrated with the multivariate regression model to discover association relationship between complex traits and multifactorial genetic loci. We refer to this second model as sparse Squared-LMM (sLMM2). Further, we extend LMM2/sLMM2 by raising the power of our squared model to the LMMn/sLMMn model. We demonstrate the practical use of our model with synthetic phenotypic variants generated from genetic loci of Arabidopsis Thaliana. The experiment shows that our method achieves a more accurate and significant prediction on the association relationship between traits and loci. We also evaluate our models on collected phenotypes and genotypes with the number of candidate genes that the models could discover. The results suggest the potential and promising usage of our method in genome-wide association studies.

Download Full-text

The power of a multivariate approach to genome-wide association studies: an example with Drosophila melanogaster wing shape

10.1101/108308 ◽

2017 ◽

Author(s):

William Pitchers ◽

Jessica Nye ◽

Eladio J. Márquez ◽

Alycia Kowalski ◽

Ian Dworkin ◽

...

Keyword(s):

Drosophila Melanogaster ◽

Multivariate Analyses ◽

Association Studies ◽

Significant Snps ◽

Wing Shape ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Phenotypic Effect ◽

Genome Wide ◽

Univariate Analyses

AbstractDue to the complexity of genotype-phenotype relationships, simultaneous analyses of genomic associations with multiple traits will be more powerful and more informative than a series of univariate analyses. In most cases, however, studies of genotype-phenotype relationships have analyzed only one trait at a time, even as the rapid advances in molecular tools have expanded our view of the genotype to include whole genomes. Here, we report the results of a fully integrated multivariate genome-wide association analysis of the shape of the Drosophila melanogaster wing in the Drosophila Genetic Reference Panel. Genotypic effects on wing shape were highly correlated between two different labs. We found 2,396 significant SNPs using a 5% FDR cutoff in the multivariate analyses, but just 4 significant SNPs in univariate analyses of scores on the first 20 principal component axes. A key advantage of multivariate analysis is that the direction of the estimated phenotypic effect is much more informative than a univariate one. Exploiting this feature, we show that the directions of effects were on average replicable in an unrelated panel of inbred lines. Effects of knockdowns of genes implicated in the initial screen were on average more similar than expected under a null model. Association studies that take a phenomic approach in considering many traits simultaneously are an important complement to the power of genomics. Multivariate analyses of such data are more powerful, more informative, and allow the unbiased study of pleiotropy.

Download Full-text