On multiple-testing correction in genome-wide association studies

Valentina Moskvina; Karl Michael Schmidt

doi:10.1002/gepi.20331

networkGWAS: A network-based approach for genome-wide association studies in structured populations

10.1101/2021.11.11.468206 ◽

2021 ◽

Author(s):

Giulia Muzio ◽

Leslie O'Bray ◽

Laetitia Meng-Papaxanthos ◽

Juliane Klatt ◽

Karsten Borgwardt

Keyword(s):

Genetic Markers ◽

Complex Traits ◽

Multiple Testing ◽

Association Studies ◽

Search Space ◽

Structured Populations ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Multiple Testing Correction ◽

Genome Wide

While the search for associations between genetic markers and complex traits has discovered tens of thousands of trait-related genetic variants, the vast majority of these only explain a tiny fraction of observed phenotypic variation. One possible strategy to detect stronger associations is to aggregate the effects of several genetic markers and to test entire genes, pathways or (sub)networks of genes for association to a phenotype. The latter, network-based genome-wide association studies, in particular suffers from a huge search space and an inherent multiple testing problem. As a consequence, current approaches are either based on greedy feature selection, thereby risking that they miss relevant associations, and/or neglect doing a multiple testing correction, which can lead to an abundance of false positive findings. To address the shortcomings of current approaches of network-based genome-wide association studies, we propose <tt>networkGWAS</tt>, a computationally efficient and statistically sound approach to gene-based genome-wide association studies based on mixed models and neighborhood aggregation. It allows for population structure correction and for well-calibrated p-values, which we obtain through a block permutation scheme. <tt>networkGWAS</tt> successfully detects known or plausible associations on simulated rare variants from H. sapiens data as well as semi-simulated and real data with common variants from A. thaliana and enables the systematic combination of gene-based genome-wide association studies with biological network information.

Download Full-text

Genome-wide meta-analysis of depression identifies 102 independent variants and highlights the importance of the prefrontal brain regions

10.1101/433367 ◽

2018 ◽

Cited By ~ 3

Author(s):

David M. Howard ◽

Mark J. Adams ◽

Toni-Kim Clarke ◽

Jonathan D. Hafferty ◽

Jude Gibson ◽

...

Keyword(s):

Multiple Testing ◽

Drug Repositioning ◽

Association Studies ◽

Meta Analysis ◽

Enrichment Analysis ◽

Brain Regions ◽

Genome Wide Association Studies ◽

Multiple Testing Correction ◽

Synaptic Structure ◽

Genome Wide

AbstractMajor depression is a debilitating psychiatric illness that is typically associated with low mood, anhedonia and a range of comorbidities. Depression has a heritable component that has remained difficult to elucidate with current sample sizes due to the polygenic nature of the disorder. To maximise sample size, we meta-analysed data on 807,553 individuals (246,363 cases and 561,190 controls) from the three largest genome-wide association studies of depression. We identified 102 independent variants, 269 genes, and 15 gene-sets associated with depression, including both genes and gene-pathways associated with synaptic structure and neurotransmission. Further evidence of the importance of prefrontal brain regions in depression was provided by an enrichment analysis. In an independent replication sample of 1,306,354 individuals (414,055 cases and 892,299 controls), 87 of the 102 associated variants were significant following multiple testing correction. Based on the putative genes associated with depression this work also highlights several potential drug repositioning opportunities. These findings advance our understanding of the complex genetic architecture of depression and provide several future avenues for understanding aetiology and developing new treatment approaches.

Download Full-text

Multiple testing in genome-wide association studies via hidden Markov models

Bioinformatics ◽

10.1093/bioinformatics/btp476 ◽

2009 ◽

Vol 25 (21) ◽

pp. 2802-2808 ◽

Cited By ~ 31

Author(s):

Zhi Wei ◽

Wenguang Sun ◽

Kai Wang ◽

Hakon Hakonarson

Keyword(s):

Hidden Markov Models ◽

Multiple Testing ◽

Markov Models ◽

Hidden Markov ◽

Association Studies ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Genome Wide

Download Full-text

Mixture model-based association analysis with case-control data in genome wide association studies

Statistical Applications in Genetics and Molecular Biology ◽

10.1515/sagmb-2016-0022 ◽

2017 ◽

Vol 16 (3) ◽

Author(s):

Fadhaa Ali ◽

Jian Zhang

Keyword(s):

Mixture Model ◽

Multiple Testing ◽

Hypothesis Test ◽

Association Studies ◽

Real Data ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Model Based ◽

Genome Wide ◽

The Individual

AbstractMultilocus haplotype analysis of candidate variants with genome wide association studies (GWAS) data may provide evidence of association with disease, even when the individual loci themselves do not. Unfortunately, when a large number of candidate variants are investigated, identifying risk haplotypes can be very difficult. To meet the challenge, a number of approaches have been put forward in recent years. However, most of them are not directly linked to the disease-penetrances of haplotypes and thus may not be efficient. To fill this gap, we propose a mixture model-based approach for detecting risk haplotypes. Under the mixture model, haplotypes are clustered directly according to their estimated disease penetrances. A theoretical justification of the above model is provided. Furthermore, we introduce a hypothesis test for haplotype inheritance patterns which underpin this model. The performance of the proposed approach is evaluated by simulations and real data analysis. The results show that the proposed approach outperforms an existing multiple testing method.

Download Full-text

Novel bioinformatics approach to investigate quantitative phenotype-genotype associations in neuroimaging studies

10.1101/015065 ◽

2015 ◽

Cited By ~ 1

Author(s):

Sejal Patel ◽

Min Tae M Park ◽

Mallar M Chakravarty ◽

Jo Knight

Keyword(s):

Multiple Testing ◽

Association Studies ◽

Genetic Research ◽

Hippocampal Volume ◽

Imaging Genetics ◽

Genome Wide Association Studies ◽

Multiple Testing Correction ◽

Novel Approach ◽

Genome Wide ◽

Novel Method

Imaging genetics is an emerging field in which the association between genes and neuroimaging-based quantitative phenotypes are used to explore the functional role of genes in neuroanatomy and neurophysiology in the context of healthy function and neuropsychiatric disorders. The main obstacle for researchers in the field is the high dimensionality of the data in both the imaging phenotypes and the genetic variants commonly typed. In this article, we develop a novel method that utilizes Gene Ontology, an online database, to select and prioritize certain genes, employing a stratified false discovery rate (sFDR) approach to investigate their associations with imaging phenotypes. sFDR has the potential to increase power in genome wide association studies (GWAS), and is quickly gaining traction as a method for multiple testing correction. Our novel approach addresses both the pressing need in genetic research to move beyond candidate gene studies, while not being overburdened with a loss of power due to multiple testing. As an example of our methodology, we perform a GWAS of hippocampal volume using the Alzheimer's Disease Neuroimaging Initiative sample.

Download Full-text

The paltry power of priors versus populations

10.1101/737676 ◽

2019 ◽

Author(s):

Jianan Zhan ◽

Dan E. Arking ◽

Joel S. Bader

Keyword(s):

Population Size ◽

Multiple Testing ◽

Association Studies ◽

Genome Wide Association ◽

P Value ◽

Genome Wide Association Studies ◽

Significant Finding ◽

Rna Seq ◽

Test Power ◽

Genome Wide

AbstractBiological experiments often involve hypothesis testing at the scale of thousands to millions of tests. Alleviating the multiple testing burden has been a goal of many methods designed to boost test power by focusing tests on the alternative hypotheses most likely to be true. Very often, these methods either explicitly or implicitly make use of prior probabilities that bias significance for favored sets thought to be enriched for significant finding. Nevertheless, most genomics experiments, and in particular genome-wide association studies (GWAS), still use traditional univariate tests rather than more sophisticated approaches. Here we use GWAS to demonstrate why unbiased tests remain in favor. We calculate test power assuming perfect knowledge of a prior distribution and then derive the population size increase required to provided the same boost without a prior. We show that population size is exponentially more important than prior, providing a rigorous explanation for the observed avoidance of prior-based methods.Author summaryBiological experiments often test thousands to millions of hypotheses. Gene-based tests for human RNA-Seq data, for example, involve approximately 20,000; genome-wide association studies (GWAS) involve about 1 million effective tests. The conventional approach is to perform individual tests and then apply a Bonferroni correction to account for multiple testing. This approach implies a single-test p-value of 2.5 × 10−6 for RNA-Seq experiments, and a p-value of 5 × 10−8 for GWAS, to control the false-positive rate at a conventional value of 0.05. Many methods have been proposed to alleviate the multiple-testing burden by incorporating a prior probability that boosts the significance for a subset of candidate genes or variants. At the extreme limit, only the candidate set is tested, corresponding to a decreased multiple testing burden. Despite decades of methods development, prior-based tests have not been generally used. Here we compare the power increase possible with a prior with the increase possible with a much simpler strategy of increasing a study size. We show that increasing the population size is exponentially more valuable than increasing the strength of prior, even when the true prior is known exactly. These results provide a rigorous explanation for the continued use of simple, robust methods rather than more sophisticated approaches.

Download Full-text

Covariate-adjusted multiple testing in genome-wide association studies via factorial hidden Markov models

Test ◽

10.1007/s11749-020-00746-8 ◽

2021 ◽

Author(s):

Tingting Cui ◽

Pengfei Wang ◽

Wensheng Zhu

Keyword(s):

Hidden Markov Models ◽

Multiple Testing ◽

Markov Models ◽

Hidden Markov ◽

Association Studies ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Genome Wide ◽

Factorial Hidden Markov Models

Download Full-text

Re-assessment of multiple testing strategies for more efficient genome-wide association studies

European Journal of Human Genetics ◽

10.1038/s41431-018-0125-3 ◽

2018 ◽

Vol 26 (7) ◽

pp. 1038-1048 ◽

Cited By ~ 3

Author(s):

Takahiro Otani ◽

Hisashi Noma ◽

Jo Nishino ◽

Shigeyuki Matsui

Keyword(s):

Multiple Testing ◽

Association Studies ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Testing Strategies ◽

Genome Wide

Download Full-text

Large-scale multiple testing in genome-wide association studies via region-specific hidden Markov models

BMC Bioinformatics ◽

10.1186/1471-2105-14-282 ◽

2013 ◽

Vol 14 (1) ◽

pp. 282 ◽

Cited By ~ 4

Author(s):

Jian Xiao ◽

Wensheng Zhu ◽

Jianhua Guo

Keyword(s):

Hidden Markov Models ◽

Multiple Testing ◽

Large Scale ◽

Markov Models ◽

Hidden Markov ◽

Association Studies ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Genome Wide

Download Full-text

Genome-wide associations for udder and teat conformational risk factors for mastitis in Holstein cows

10.21203/rs.2.16763/v1 ◽

2019 ◽

Author(s):

Asha M. Miles ◽

Christian Posbergh ◽

Heather Jay Huson

Keyword(s):

Cell Proliferation ◽

Multiple Testing ◽

Association Studies ◽

Principal Component ◽

Special Focus ◽

Genome Wide Association Studies ◽

Multiple Testing Correction ◽

Genome Wide ◽

Length Width ◽

Mastitis Susceptibility

Abstract BACKGROUND The objective of our study was to conduct high-density genome-wide association studies of dairy cow udder and teat conformation with direct phenotyping. We identified and compared quantitative trait loci ( QTL ) for a novel composite mastitis risk trait and considered environmental impact of milking by comparing primiparous cows only. Cows (N = 471) were genotyped on the Illumina BovineHD 777K beadchip and scored for front and rear teat length, width, end shape, and placement, fore udder attachment, udder cleft, udder depth, rear udder height, and rear udder width. Principal component analysis was performed on fore udder attachment, rear teat end shape, rear teat width, and rear udder height, to create a single new phenotype describing mastitis susceptibility based on these high-risk traits.RESULTS Over all 14 traits of interest, a total of 56 genome-wide associations were performed and 28 significantly associated (Bonferroni multiple testing correction < 0.05) QTL were identified. The linkage disequilibrium ( LD ) block surrounding the associated QTL or a 1 Mb window in the absence of LD was interrogated for candidate genes, resulting in the identification of genes with functions related to both cell proliferation and immune signaling, including ZNF683, DHX9, CUX1, TNNT1 , and SPRY1 . We assessed a primiparous only subset of cows (n = 144) to account for the possibility that the genetic variance component of the phenotype is greater for cows who have had less exposure to the environment, and observed different associated QTL and inheritance patterns for udder depth in primiparous cows compared to the total cohort.CONCLUSION Special focus was given to the aforementioned mastitis risk traits, and candidate gene investigation revealed both immune function and cell proliferation related genes in the areas surrounding significantly associated QTL, suggesting that selecting for mastitis resistant cows based on these traits would be an effective method for increasing mastitis resiliency in a herd.

Download Full-text