The more the merrier? Multivariate approaches to genome-wide association analysis

Mapping Intimacies ◽

10.1101/610287 ◽

2019 ◽

Cited By ~ 1

Author(s):

César-Reyer Vroom ◽

Christiaan de Leeuw ◽

Danielle Posthuma ◽

Conor V. Dolan ◽

Sophie van der Sluis

Keyword(s):

Complex Traits ◽

Large Scale ◽

Type I Error ◽

Error Rates ◽

Genome Wide Association ◽

Data Sets ◽

Type I ◽

Genome Wide ◽

Statistical Background ◽

Gwa Studies

AbstractThe vast majority of genome-wide association (GWA) studies analyze a single trait while large-scale multivariate data sets are available. As complex traits are highly polygenic, and pleiotropy seems ubiquitous, it is essential to determine when multivariate association tests (MATs) outperform univariate approaches in terms of power. We discuss the statistical background of 19 MATs and give an overview of their statistical properties. We address the Type I error rates of these MATs and demonstrate which factors can cause bias. Finally, we examine, compare, and discuss the power of these MATs, varying the number of traits, the correlational pattern between the traits, the number of affected traits, and the sign of the genetic effects. Our results demonstrate under which circumstances specific MATs perform most optimal. Through sharing of flexible simulation scripts, we facilitate a standard framework for comparing Type I error rate and power of new MATs to that of existing ones.

Download Full-text

The effect of different sets of critical values on type I error rates in tiled regression for genome-wide association studies

International Journal of Data Mining and Bioinformatics ◽

10.1504/ijdmb.2016.080030 ◽

2016 ◽

Vol 16 (2) ◽

pp. 111

Author(s):

Heejong Sung ◽

Jeremy A. Sabourin ◽

Alexa J.M. Sorant ◽

Alexander F. Wilson

Keyword(s):

Type I Error ◽

Association Studies ◽

Error Rates ◽

Critical Values ◽

Genome Wide Association ◽

Type I ◽

Genome Wide Association Studies ◽

Type I Error Rates ◽

Genome Wide

Download Full-text

The effect of different sets of critical values on type I error rates in tiled regression for genome-wide association studies

International Journal of Data Mining and Bioinformatics ◽

10.1504/ijdmb.2016.10000871 ◽

2016 ◽

Vol 16 (2) ◽

pp. 111

Author(s):

Alexander F. Wilson ◽

Heejong Sung ◽

Jeremy A. Sabourin ◽

Alexa J.M. Sorant

Keyword(s):

Type I Error ◽

Association Studies ◽

Error Rates ◽

Critical Values ◽

Genome Wide Association ◽

Type I ◽

Genome Wide Association Studies ◽

Type I Error Rates ◽

Genome Wide

Download Full-text

Statistical Learning Methods Applicable to Genome-Wide Association Studies on Unbalanced Case-Control Disease Data

Genes ◽

10.3390/genes12050736 ◽

2021 ◽

Vol 12 (5) ◽

pp. 736

Author(s):

Xiaotian Dai ◽

Guifang Fu ◽

Shaofei Zhao ◽

Yifei Zeng

Keyword(s):

Type I Error ◽

Association Studies ◽

Case Control ◽

Error Rates ◽

Genome Wide Association ◽

Type I ◽

Genome Wide Association Studies ◽

Learning Approaches ◽

Genome Wide ◽

Control Disease

Despite the fact that imbalance between case and control groups is prevalent in genome-wide association studies (GWAS), it is often overlooked. This imbalance is getting more significant and urgent as the rapid growth of biobanks and electronic health records have enabled the collection of thousands of phenotypes from large cohorts, in particular for diseases with low prevalence. The unbalanced binary traits pose serious challenges to traditional statistical methods in terms of both genomic selection and disease prediction. For example, the well-established linear mixed models (LMM) yield inflated type I error rates in the presence of unbalanced case-control ratios. In this article, we review multiple statistical approaches that have been developed to overcome the inaccuracy caused by the unbalanced case-control ratio, with the advantages and limitations of each approach commented. In addition, we also explore the potential for applying several powerful and popular state-of-the-art machine-learning approaches, which have not been applied to the GWAS field yet. This review paves the way for better analysis and understanding of the unbalanced case-control disease data in GWAS.

Download Full-text

Efficient mixed model approach for large-scale genome-wide association studies of ordinal categorical phenotypes

10.1101/2020.10.09.333146 ◽

2020 ◽

Author(s):

Wenjian Bi ◽

Wei Zhou ◽

Rounak Dey ◽

Bhramar Mukherjee ◽

Joshua N Sampson ◽

...

Keyword(s):

Mixed Model ◽

Type I Error ◽

Association Studies ◽

Error Rates ◽

Genome Wide Association ◽

Alternative Methods ◽

Type I ◽

Genome Wide Association Studies ◽

Type I Error Rates ◽

Genome Wide

AbstractIn genome-wide association studies (GWAS), ordinal categorical phenotypes are widely used to measure human behaviors, satisfaction, and preferences. However, due to the lack of analysis tools, methods designed for binary and quantitative traits have often been used inappropriately to analyze categorical phenotypes, which produces inflated type I error rates or is less powerful. To accurately model the dependence of an ordinal categorical phenotype on covariates, we propose an efficient mixed model association test, Proportional Odds Logistic Mixed Model (POLMM). POLMM is demonstrated to be computationally efficient to analyze large datasets with hundreds of thousands of genetic related samples, can control type I error rates at a stringent significance level regardless of the phenotypic distribution, and is more powerful than other alternative methods. We applied POLMM to 258 ordinal categorical phenotypes on array-genotypes and imputed samples from 408,961 individuals in UK Biobank. In total, we identified 5,885 genome-wide significant variants, of which 424 variants (7.2%) are rare variants with MAF < 0.01.

Download Full-text

GxEsum: a novel approach to estimate the phenotypic variance explained by genome-wide GxE interaction based on GWAS summary statistics for biobank-scale data

Genome Biology ◽

10.1186/s13059-021-02403-1 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Jisu Shin ◽

Sang Hong Lee

Keyword(s):

Complex Traits ◽

Error Rates ◽

Type I ◽

Phenotypic Variance ◽

Environment Interaction ◽

Summary Statistics ◽

Gxe Interaction ◽

Genome Wide ◽

Scale Data ◽

Variance Explained

AbstractGenetic variation in response to the environment, that is, genotype-by-environment interaction (GxE), is fundamental in the biology of complex traits and diseases. However, existing methods are computationally demanding and infeasible to handle biobank-scale data. Here, we introduce GxEsum, a method for estimating the phenotypic variance explained by genome-wide GxE based on GWAS summary statistics. Through comprehensive simulations and analysis of UK Biobank with 288,837 individuals, we show that GxEsum can handle a large-scale biobank dataset with controlled type I error rates and unbiased GxE estimates, and its computational efficiency can be hundreds of times higher than existing GxE methods.

Download Full-text

A comparative analysis of cell-type adjustment methods for epigenome-wide association studies based on simulated and real data sets

Briefings in Bioinformatics ◽

10.1093/bib/bby068 ◽

2018 ◽

Vol 20 (6) ◽

pp. 2055-2065 ◽

Cited By ~ 1

Author(s):

Johannes Brägelmann ◽

Justo Lorenzo Bermejo

Keyword(s):

Statistical Power ◽

Type I Error ◽

Association Studies ◽

Real Data ◽

Error Rates ◽

Data Sets ◽

Type I ◽

Cell Type ◽

Type I Error Rates

Abstract Technological advances and reduced costs of high-density methylation arrays have led to an increasing number of association studies on the possible relationship between human disease and epigenetic variability. DNA samples from peripheral blood or other tissue types are analyzed in epigenome-wide association studies (EWAS) to detect methylation differences related to a particular phenotype. Since information on the cell-type composition of the sample is generally not available and methylation profiles are cell-type specific, statistical methods have been developed for adjustment of cell-type heterogeneity in EWAS. In this study we systematically compared five popular adjustment methods: the factored spectrally transformed linear mixed model (FaST-LMM-EWASher), the sparse principal component analysis algorithm ReFACTor, surrogate variable analysis (SVA), independent SVA (ISVA) and an optimized version of SVA (SmartSVA). We used real data and applied a multilayered simulation framework to assess the type I error rate, the statistical power and the quality of estimated methylation differences according to major study characteristics. While all five adjustment methods improved false-positive rates compared with unadjusted analyses, FaST-LMM-EWASher resulted in the lowest type I error rate at the expense of low statistical power. SVA efficiently corrected for cell-type heterogeneity in EWAS up to 200 cases and 200 controls, but did not control type I error rates in larger studies. Results based on real data sets confirmed simulation findings with the strongest control of type I error rates by FaST-LMM-EWASher and SmartSVA. Overall, ReFACTor, ISVA and SmartSVA showed the best comparable statistical power, quality of estimated methylation differences and runtime.

Download Full-text

An efficient genome-wide association test for mixed binary and continuous phenotypes with applications to substance abuse research

Statistical Methods in Medical Research ◽

10.1177/0962280216647422 ◽

2016 ◽

Vol 27 (3) ◽

pp. 905-919

Author(s):

Anne Buu ◽

L Keoki Williams ◽

James J Yang

Keyword(s):

Type I Error ◽

Empirical Distribution ◽

Genetic Effect ◽

Association Test ◽

Genome Wide Association ◽

Type I ◽

Multiple Phenotypes ◽

Power Of The Test ◽

Genome Wide ◽

The One

We propose a new genome-wide association test for mixed binary and continuous phenotypes that uses an efficient numerical method to estimate the empirical distribution of the Fisher’s combination statistic under the null hypothesis. Our simulation study shows that the proposed method controls the type I error rate and also maintains its power at the level of the permutation method. More importantly, the computational efficiency of the proposed method is much higher than the one of the permutation method. The simulation results also indicate that the power of the test increases when the genetic effect increases, the minor allele frequency increases, and the correlation between responses decreases. The statistical analysis on the database of the Study of Addiction: Genetics and Environment demonstrates that the proposed method combining multiple phenotypes can increase the power of identifying markers that may not be, otherwise, chosen using marginal tests.

Download Full-text

Transcriptome-wide association supplements genome-wide association in Zea mays

10.1101/363242 ◽

2018 ◽

Cited By ~ 4

Author(s):

Karl A. G. Kremling ◽

Christine H. Diepenbrock ◽

Michael A. Gore ◽

Edward S. Buckler ◽

Nonoy B. Bandillo

Keyword(s):

Gene Expression ◽

Complex Traits ◽

Large Scale ◽

New Technologies ◽

Seed Quality ◽

Association Studies ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Biological Organization ◽

Genome Wide

AbstractModern improvement of complex traits in agricultural species relies on successful associations of heritable molecular variation with observable phenotypes. Historically, this pursuit has primarily been based on easily measurable genetic markers. The recent advent of new technologies allows assaying and quantifying biological intermediates (hereafter endophenotypes) which are now readily measurable at a large scale across diverse individuals. The potential of using endophenotypes for dissecting traits of interest remains underexplored in plants. The work presented here illustrated the utility of a large-scale (299 genotype and 7 tissue) gene expression resource to dissect traits across multiple levels of biological organization. Using single-tissue- and multi-tissue-based transcriptome-wide association studies (TWAS), we revealed that about half of the functional variation for agronomic and seed quality (carotenoid, tocochromanol) traits is regulatory. Comparing the efficacy of TWAS with genome-wide association studies (GWAS) and an ensemble approach that combines both GWAS and TWAS, we demonstrated that results of TWAS in combination with GWAS increase the power to detect known genes and aid in prioritizing likely causal genes. Using a variance partitioning approach in the independent maize Nested Association Mapping (NAM) population, we also showed that the most strongly associated genes identified by combining GWAS and TWAS explain more heritable variance for a majority of traits, beating the heritability captured by the random genes and the genes identified by GWAS or TWAS alone. This improves not only the ability to link genes to phenotypes, but also highlights the phenotypic consequences of regulatory variation in plants.Author summaryWe examined the ability to associate variability in gene expression directly with terminal phenotypes of interest, as a supplement linking genotype to phenotype. We found that transcriptome-wide association studies (TWAS) are a useful accessory to genome-wide association studies (GWAS). In a combined test with GWAS results, TWAS improves the capacity to re-detect genes known to underlie quantitative trait loci for kernel and agronomic phenotypes. This improves not only the capacity to link genes to phenotypes, but also illustrates the widespread importance of regulation for phenotype.

Download Full-text

Multi-trait genome-wide analyses of the brain imaging phenotypes in UK Biobank

10.1101/758326 ◽

2019 ◽

Cited By ~ 1

Author(s):

Chong Wu

Keyword(s):

Type I Error ◽

Association Studies ◽

Error Rates ◽

Type I ◽

Genome Wide Association Studies ◽

Uk Biobank ◽

Type I Error Rates ◽

Trait Association ◽

Genome Wide ◽

Inflation Factor

AbstractMany genetic variants identified in genome-wide association studies (GWAS) are associated with multiple, sometimes seemingly unrelated traits. This motivates multi-trait association analyses, which have successfully identified novel associated loci for many complex diseases. While appealing, most existing methods focus on analyzing a relatively small number of traits and may yield inflated Type I error rates when a large number of traits need to be analyzed jointly. As deep phenotyping data are becoming rapidly available, we develop a novel method, referred to as aMAT (adaptive multi-trait association test), for multi-trait analysis of any number of traits. We applied aMAT to GWAS summary statistics for a set of 58 volumetric imaging derived phenotypes from the UK Biobank. aMAT had a genomic inflation factor of 1.04, indicating the Type I error rates were well controlled. More important, aMAT identified 24 distinct risk loci, 13 of which were ignored by standard GWAS. In comparison, the competing methods either had a suspicious genomic inflation factor or identified much fewer risk loci. Finally, four additional sets of traits have been analyzed and provided similar conclusions.

Download Full-text

An empirical evaluation of imputation accuracy for association statistics reveals increased type-I error rates in genome-wide associations

BMC Genetics ◽

10.1186/1471-2156-12-10 ◽

2011 ◽

Vol 12 (1) ◽

pp. 10 ◽

Cited By ~ 5

Author(s):

Marcio AA Almeida ◽

Paulo SL Oliveira ◽

Tiago V Pereira ◽

José E Krieger ◽

Alexandre C Pereira

Keyword(s):

Type I Error ◽

Imputation Accuracy ◽

Empirical Evaluation ◽

Error Rates ◽

Type I ◽

Type I Error Rates ◽

Genome Wide

Download Full-text