Computationally efficient familywise error rate control in genome‐wide association studies using score tests for generalized linear models

Kari Krizak Halle; Øyvind Bakke; Srdjan Djurovic; Anja Bye; Einar Ryeng; Ulrik Wisløff; Ole A. Andreassen; Mette Langaas

doi:10.1111/sjos.12451

Mixed logistic regression in genome-wide association studies

BMC Bioinformatics ◽

10.1186/s12859-020-03862-2 ◽

2020 ◽

Vol 21 (1) ◽

Author(s):

Jacqueline Milet ◽

David Courtin ◽

André Garcia ◽

Hervé Perdry

Keyword(s):

Logistic Regression ◽

Linear Models ◽

Cox Model ◽

Association Studies ◽

Scale Up ◽

Score Test ◽

R Package ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Genome Wide

Abstract Background Mixed linear models (MLM) have been widely used to account for population structure in case-control genome-wide association studies, the status being analyzed as a quantitative phenotype. Chen et al. proved in 2016 that this method is inappropriate in some situations and proposed GMMAT, a score test for the mixed logistic regression (MLR). However, this test does not produces an estimation of the variants’ effects. We propose two computationally efficient methods to estimate the variants’ effects. Their properties and those of other methods (MLM, logistic regression) are evaluated using both simulated and real genomic data from a recent GWAS in two geographically close population in West Africa. Results We show that, when the disease prevalence differs between population strata, MLM is inappropriate to analyze binary traits. MLR performs the best in all circumstances. The variants’ effects are well evaluated by our methods, with a moderate bias when the effect sizes are large. Additionally, we propose a stratified QQ-plot, enhancing the diagnosis of p values inflation or deflation when population strata are not clearly identified in the sample. Conclusion The two proposed methods are implemented in the R package milorGWAS available on the CRAN. Both methods scale up to at least 10,000 individuals. The same computational strategies could be applied to other models (e.g. mixed Cox model for survival analysis).

Download Full-text

Bayesian meta-analysis across genome-wide association studies of diverse phenotypes

10.1101/477828 ◽

2018 ◽

Author(s):

Holly Trochet ◽

Matti Pirinen ◽

Gavin Band ◽

Luke Jostins ◽

Gilean McVean ◽

...

Keyword(s):

Genetic Basis ◽

Association Studies ◽

Meta Analysis ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Summary Statistics ◽

Computationally Efficient ◽

Genome Wide ◽

Wide Range ◽

Study Designs

AbstractGenome-wide association studies (GWAS) are a powerful tool for understanding the genetic basis of diseases and traits, but most studies have been conducted in isolation, with a focus on either a single or a set of closely related phenotypes. We describe MetABF, a simple Bayesian framework for performing integrative meta-analysis across multiple GWAS using summary statistics. The approach is applicable across a wide range of study designs and can increase the power by 50% compared to standard frequentist tests when only a subset of studies have a true effect. We demonstrate its utility in a meta-analysis of 20 diverse GWAS which were part of the Wellcome Trust Case-Control Consortium 2. The novelty of the approach is its ability to explore, and assess the evidence for, a range of possible true patterns of association across studies in a computationally efficient framework.

Download Full-text

Mixed Logistic Regression in Genome-Wide Association Studies

10.1101/2020.01.17.910109 ◽

2020 ◽

Author(s):

Jacqueline Milet ◽

Hervé Perdry

Keyword(s):

Logistic Regression ◽

Linear Models ◽

Association Studies ◽

Score Test ◽

R Package ◽

Genome Wide Association ◽

Supplementary Information ◽

Genome Wide Association Studies ◽

Mixed Linear Models ◽

Genome Wide

AbstractMotivationMixed linear models (MLM) have been widely used to account for population structure in case-control genome-wide association studies, the status being analyzed as a quantitative phenotype. Chen et al. proved that this method is inappropriate and proposed a score test for the mixed logistic regression (MLR). However this test does not allow an estimation of the variants’ effects.ResultsWe propose two computationally efficient methods to estimate the variants’ effects. Their properties are evaluated on two simulations sets, and compared with other methods (MLM, logistic regression). MLR performs the best in all circumstances. The variants’ effects are well evaluated by our methods, with a moderate bias when the effect sizes are large. Additionally, we propose a stratified QQ-plot, enhancing the diagnosis of p-values inflation or deflation, when population strata are not clearly identified in the sample.AvailabilityAll methods are implemented in the R package milorGWAS available at https://github.com/genostats/[email protected] informationSupplementary data are available at Bioinformatics online.

Download Full-text

Methodological implementation of mixed linear models in multi-locus genome-wide association studies

Briefings in Bioinformatics ◽

10.1093/bib/bbx028 ◽

2017 ◽

Vol 18 (5) ◽

pp. 906-906 ◽

Cited By ~ 12

Author(s):

Yang-Jun Wen ◽

Hanwen Zhang ◽

Yuan-Li Ni ◽

Bo Huang ◽

Jin Zhang ◽

...

Keyword(s):

Linear Models ◽

Association Studies ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Mixed Linear Models ◽

Genome Wide

Download Full-text

Methodological implementation of mixed linear models in multi-locus genome-wide association studies

Briefings in Bioinformatics ◽

10.1093/bib/bbw145 ◽

2017 ◽

Vol 19 (4) ◽

pp. 700-712 ◽

Cited By ~ 71

Author(s):

Yang-Jun Wen ◽

Hanwen Zhang ◽

Yuan-Li Ni ◽

Bo Huang ◽

Jin Zhang ◽

...

Keyword(s):

Linear Models ◽

Association Studies ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Mixed Linear Models ◽

Genome Wide

Download Full-text

MI-GWAS: a SAS platform for the analysis of inherited and maternal genetic effects in genome-wide association studies using log-linear models

BMC Bioinformatics ◽

10.1186/1471-2105-12-117 ◽

2011 ◽

Vol 12 (1) ◽

Cited By ~ 5

Author(s):

AJ Agopian ◽

Laura E Mitchell

Keyword(s):

Linear Models ◽

Association Studies ◽

Genetic Effects ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Genome Wide ◽

Log Linear

Download Full-text

Iterative hard thresholding in genome-wide association studies: Generalized linear models, prior weights, and double sparsity

GigaScience ◽

10.1093/gigascience/giaa044 ◽

2020 ◽

Vol 9 (6) ◽

Author(s):

Benjamin B Chu ◽

Kevin L Keys ◽

Christopher A German ◽

Hua Zhou ◽

Jin J Zhou ◽

...

Keyword(s):

Genetic Variants ◽

Prior Information ◽

Linear Models ◽

Association Studies ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Lasso Regression ◽

Hard Thresholding ◽

Genome Wide ◽

Iterative Hard Thresholding

Abstract Background Consecutive testing of single nucleotide polymorphisms (SNPs) is usually employed to identify genetic variants associated with complex traits. Ideally one should model all covariates in unison, but most existing analysis methods for genome-wide association studies (GWAS) perform only univariate regression. Results We extend and efficiently implement iterative hard thresholding (IHT) for multiple regression, treating all SNPs simultaneously. Our extensions accommodate generalized linear models, prior information on genetic variants, and grouping of variants. In our simulations, IHT recovers up to 30% more true predictors than SNP-by-SNP association testing and exhibits a 2–3 orders of magnitude decrease in false-positive rates compared with lasso regression. We also test IHT on the UK Biobank hypertension phenotypes and the Northern Finland Birth Cohort of 1966 cardiovascular phenotypes. We find that IHT scales to the large datasets of contemporary human genetics and recovers the plausible genetic variants identified by previous studies. Conclusions Our real data analysis and simulation studies suggest that IHT can (i) recover highly correlated predictors, (ii) avoid over-fitting, (iii) deliver better true-positive and false-positive rates than either marginal testing or lasso regression, (iv) recover unbiased regression coefficients, (v) exploit prior information and group-sparsity, and (vi) be used with biobank-sized datasets. Although these advances are studied for genome-wide association studies inference, our extensions are pertinent to other regression problems with large numbers of predictors.

Download Full-text

Covariate Adaptive Family-wise Error Rate Control for Genome-Wide Association Studies

Biometrika ◽

10.1093/biomet/asaa098 ◽

2020 ◽

Author(s):

Huijuan Zhou ◽

Xianyang Zhang ◽

Jun Chen

Keyword(s):

Error Rate ◽

Rate Control ◽

Multiple Testing ◽

Statistical Power ◽

Association Studies ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Nucleotide Polymorphisms ◽

Family Wise Error Rate ◽

Genome Wide

Abstract The family-wise error rate (FWER) has been widely used in genome-wide association studies. With the increasing availability of functional genomics data, it is possible to increase the detection power by leveraging these genomic functional annotations. Previous efforts to accommodate covariates in multiple testing focus on the false discovery rate control while covariate-adaptive FWER-controlling procedures remain under-developed. Here we propose a novel covariate-adaptive FWER-controlling procedure that incorporates external covariates which are potentially informative of either the statistical power or the prior null probability. An efficient algorithm is developed to implement the proposed method. We prove its asymptotic validity and obtain the rate of convergence through a perturbation-type argument. Our numerical studies show that the new procedure is more powerful than competing methods and maintains robustness across different settings. We apply the proposed approach to the UK Biobank data and analyze 27 traits with 9 million single-nucleotide polymorphisms tested for associations. Seventy-five genomic annotations are used as covariates. Our approach detects more genome-wide significant loci than other methods in 21 out of the 27 traits.

Download Full-text

Improving Power of Genome-Wide Association Studies with Weighted False Discovery Rate Control and Prioritized Subset Analysis

PLoS ONE ◽

10.1371/journal.pone.0033716 ◽

2012 ◽

Vol 7 (4) ◽

pp. e33716 ◽

Cited By ~ 9

Author(s):

Wan-Yu Lin ◽

Wen-Chung Lee

Keyword(s):

False Discovery Rate ◽

Rate Control ◽

Association Studies ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

False Discovery Rate Control ◽

Subset Analysis ◽

False Discovery ◽

Genome Wide

Download Full-text

A new approach of dissecting genetic effects for complex traits

10.1101/2020.10.16.336180 ◽

2020 ◽

Cited By ~ 1

Author(s):

Meng Luo ◽

Shiliang Gu

Keyword(s):

Population Structure ◽

Complex Traits ◽

Mixed Model ◽

Association Studies ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Computationally Efficient ◽

New Approach ◽

Genome Wide ◽

Outbred Mice

AbstractDuring the past decades, genome-wide association studies (GWAS) have been used to successfully identify tens of thousands of genetic variants associated with complex traits included in humans, animals, and plants. All common genome-wide association (GWA) methods rely on population structure correction to avoid false genotype and phenotype associations. However, population structure correction is a stringent penalization, which also impedes the identification of real associations. Here, we used recent statistical advances and proposed iterative screen regression (ISR), which enables simultaneous multiple marker associations and shown to appropriately correction population stratification and cryptic relatedness in GWAS. Results from analyses of simulated suggest that the proposed ISR method performed well in terms of power (sensitivity) versus FDR (False Discovery Rate) and specificity, also less bias (higher accuracy) in effect (PVE) estimation than the existing multi-loci (mixed) model and the single-locus (mixed) model. We also show the practicality of our approach by applying it to rice, outbred mice, and A.thaliana datasets. It identified several new causal loci that other methods did not detect. Our ISR provides an alternative for multi-loci GWAS, and the implementation was computationally efficient, analyzing large datasets practicable (n>100,000).

Download Full-text