Case-Base-Control designs

Optimal Genomic Control in Large-scale Genetic Associations for Binary Diseases

10.21203/rs.3.rs-318017/v2 ◽

2021 ◽

Author(s):

Runqing Yang ◽

Yuxin Song ◽

Li Jiang ◽

Zhiyu Hao ◽

Runqing Yang

Keyword(s):

Multiple Testing ◽

Statistical Power ◽

Large Scale ◽

Association Studies ◽

Joint Analysis ◽

Genome Wide Association Studies ◽

Genetic Associations ◽

Genomic Heritability ◽

Large Scale Data ◽

Genome Wide

Abstract Complex computation and approximate solution hinder the application of generalized linear mixed models (GLMM) into genome-wide association studies. We extended GRAMMAR to handle binary diseases by considering genomic breeding values (GBVs) estimated in advance as a known predictor in genomic logit regression, and then controlled polygenic effects by regulating downward genomic heritability. Using simulations and case analyses, we showed in optimizing GRAMMAR, polygenic effects and genomic controls could be evaluated using the fewer sampling markers, which extremely simplified GLMM-based association analysis in large-scale data. In addition, joint analysis for quantitative trait nucleotide (QTN) candidates chosen by multiple testing offered significant improved statistical power to detect QTNs over existing methods.

Download Full-text

An atlas of genetic associations in UK Biobank

10.1101/176834 ◽

2017 ◽

Cited By ~ 18

Author(s):

Oriol Canela-Xandri ◽

Konrad Rawlik ◽

Albert Tenesa

Keyword(s):

Complex Traits ◽

Statistical Power ◽

Association Studies ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Uk Biobank ◽

Genetic Associations ◽

Genome Wide ◽

Related Individuals ◽

Sufficient Statistical Power

ABSTRACTGenome-wide association studies have revealed many loci contributing to the variation of complex traits, yet the majority of loci that contribute to the heritability of complex traits remain elusive. Large study populations with sufficient statistical power are required to detect the small effect sizes of the yet unidentified genetic variants. However, the analysis of huge cohorts, like UK Biobank, is complicated by incidental structure present when collecting such large cohorts. For instance, UK Biobank comprises 107,162 third degree or closer related participants. Traditionally, GWAS have removed related individuals because they comprised an insignificant proportion of the overall sample size, however, removing related individuals in UK Biobank would entail a substantial loss of power. Furthermore, modelling such structure using linear mixed models is computationally expensive, which requires a computational infrastructure that may not be accessible to all researchers. Here we present an atlas of genetic associations for 118 non-binary and 599 binary traits of 408,455 related and unrelated UK Biobank participants of White-British descent. Results are compiled in a publicly accessible database that allows querying genome-wide association summary results for 623,944 genotyped and HapMap2 imputed SNPs, as well downloading whole GWAS summary statistics for over 30 million imputed SNPs from the Haplotype Reference Consortium panel. Our atlas of associations (GeneATLAS,http://geneatlas.roslin.ed.ac.uk) will help researchers to query UK Biobank results in an easy way without the need to incur in high computational costs.

Download Full-text

Optimal Genomic Control in Large-scale Genetic Associations for Binary Diseases

10.21203/rs.3.rs-318017/v1 ◽

2021 ◽

Author(s):

Runqing Yang ◽

Yuxin Song ◽

Li Jiang ◽

Zhiyu Hao ◽

Runqing Yang

Keyword(s):

Multiple Testing ◽

Statistical Power ◽

Large Scale ◽

Association Studies ◽

Joint Analysis ◽

Genome Wide Association Studies ◽

Genetic Associations ◽

Genomic Heritability ◽

Large Scale Data ◽

Genome Wide

Abstract Complex computation and approximate solution hinder the application of generalized linear mixed models (GLMM) into genome-wide association studies. We extended GRAMMAR to handle binary diseases by considering genomic breeding values (GBVs) estimated in advance as a known predictor in genomic logit regression, and then controlled polygenic effects by regulating downward genomic heritability. Using simulations and case analyses, we showed in optimizing GRAMMAR, polygenic effects and genomic controls could be evaluated using the fewer sampling markers, which extremely simplified GLMM-based association analysis in large-scale data. In addition, joint analysis for quantitative trait nucleotide (QTN) candidates chosen by multiple testing offered significant improved statistical power to detect QTNs over existing methods.

Download Full-text

Common genetic variants with fetal effects on birth weight are enriched for proximity to genes implicated in rare developmental disorders

Human Molecular Genetics ◽

10.1093/hmg/ddab060 ◽

2021 ◽

Author(s):

Robin N Beaumont ◽

Isabelle K Mayne ◽

Rachel M Freathy ◽

Caroline F Wright

Keyword(s):

Birth Weight ◽

Statistical Power ◽

Developmental Disorders ◽

Association Studies ◽

Later Life ◽

Genome Wide Association Studies ◽

Nucleotide Polymorphisms ◽

Genome Wide ◽

Common Genetic Variants ◽

Causal Genes

Abstract Birth weight is an important factor in newborn survival; both low and high birth weights are associated with adverse later-life health outcomes. Genome-wide association studies (GWAS) have identified 190 loci associated with maternal or fetal effects on birth weight. Knowledge of the underlying causal genes is crucial to understand how these loci influence birth weight and the links between infant and adult morbidity. Numerous monogenic developmental syndromes are associated with birth weights at the extreme ends of the distribution. Genes implicated in those syndromes may provide valuable information to prioritize candidate genes at the GWAS loci. We examined the proximity of genes implicated in developmental disorders (DDs) to birth weight GWAS loci using simulations to test whether they fall disproportionately close to the GWAS loci. We found birth weight GWAS single nucleotide polymorphisms (SNPs) fall closer to such genes than expected both when the DD gene is the nearest gene to the birth weight SNP and also when examining all genes within 258 kb of the SNP. This enrichment was driven by genes causing monogenic DDs with dominant modes of inheritance. We found examples of SNPs in the intron of one gene marking plausible effects via different nearby genes, highlighting the closest gene to the SNP not necessarily being the functionally relevant gene. This is the first application of this approach to birth weight, which has helped identify GWAS loci likely to have direct fetal effects on birth weight, which could not previously be classified as fetal or maternal owing to insufficient statistical power.

Download Full-text

Statistical power and utility of meta-analysis methods for cross-phenotype genome-wide association studies

PLoS ONE ◽

10.1371/journal.pone.0193256 ◽

2018 ◽

Vol 13 (3) ◽

pp. e0193256 ◽

Cited By ~ 13

Author(s):

Zhaozhong Zhu ◽

Verneri Anttila ◽

Jordan W. Smoller ◽

Phil H. Lee

Keyword(s):

Statistical Power ◽

Association Studies ◽

Meta Analysis ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Analysis Methods ◽

Genome Wide

Download Full-text

Statistical Power of Model Selection Strategies for Genome-Wide Association Studies

PLoS Genetics ◽

10.1371/journal.pgen.1000582 ◽

2009 ◽

Vol 5 (7) ◽

pp. e1000582 ◽

Cited By ~ 14

Author(s):

Zheyang Wu ◽

Hongyu Zhao

Keyword(s):

Model Selection ◽

Statistical Power ◽

Association Studies ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Selection Strategies ◽

Genome Wide

Download Full-text

Combining Multiple Hypothesis Testing with Machine Learning Increases the Statistical Power of Genome-wide Association Studies

Scientific Reports ◽

10.1038/srep36671 ◽

2016 ◽

Vol 6 (1) ◽

Cited By ~ 20

Author(s):

Bettina Mieth ◽

Marius Kloft ◽

Juan Antonio Rodríguez ◽

Sören Sonnenburg ◽

Robin Vobruba ◽

...

Keyword(s):

Machine Learning ◽

Hypothesis Testing ◽

Statistical Power ◽

Association Studies ◽

Multiple Hypothesis Testing ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Multiple Hypothesis ◽

Genome Wide

Download Full-text

The harmonic mean p-value for combining dependent tests

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1814092116 ◽

2019 ◽

Vol 116 (4) ◽

pp. 1195-1200 ◽

Cited By ~ 43

Author(s):

Daniel J. Wilson

Keyword(s):

Multiple Testing ◽

Statistical Power ◽

Scientific Discovery ◽

Association Studies ◽

Harmonic Mean ◽

P Value ◽

Genome Wide Association Studies ◽

Familywise Error Rate ◽

Significance Threshold ◽

Genome Wide

Analysis of “big data” frequently involves statistical comparison of millions of competing hypotheses to discover hidden processes underlying observed patterns of data, for example, in the search for genetic determinants of disease in genome-wide association studies (GWAS). Controlling the familywise error rate (FWER) is considered the strongest protection against false positives but makes it difficult to reach the multiple testing-corrected significance threshold. Here, I introduce the harmonic mean p-value (HMP), which controls the FWER while greatly improving statistical power by combining dependent tests using generalized central limit theorem. I show that the HMP effortlessly combines information to detect statistically significant signals among groups of individually nonsignificant hypotheses in examples of a human GWAS for neuroticism and a joint human–pathogen GWAS for hepatitis C viral load. The HMP simultaneously tests all ways to group hypotheses, allowing the smallest groups of hypotheses that retain significance to be sought. The power of the HMP to detect significant hypothesis groups is greater than the power of the Benjamini–Hochberg procedure to detect significant hypotheses, although the latter only controls the weaker false discovery rate (FDR). The HMP has broad implications for the analysis of large datasets, because it enhances the potential for scientific discovery.

Download Full-text

Insights into the genetic basis of retinal detachment

Human Molecular Genetics ◽

10.1093/hmg/ddz294 ◽

2019 ◽

Vol 29 (4) ◽

pp. 689-702 ◽

Cited By ~ 2

Author(s):

Thibaud S Boutin ◽

David G Charteris ◽

Aman Chandra ◽

Susan Campbell ◽

Caroline Hayward ◽

...

Keyword(s):

Retinal Detachment ◽

Association Studies ◽

Genetic Correlations ◽

Self Report ◽

Cataract Operation ◽

Genome Wide Association Studies ◽

Uk Biobank ◽

Genetic Associations ◽

Data Set ◽

Genome Wide

Abstract Retinal detachment (RD) is a serious and common condition, but genetic studies to date have been hampered by the small size of the assembled cohorts. In the UK Biobank data set, where RD was ascertained by self-report or hospital records, genetic correlations between RD and high myopia or cataract operation were, respectively, 0.46 (SE = 0.08) and 0.44 (SE = 0.07). These correlations are consistent with known epidemiological associations. Through meta-analysis of genome-wide association studies using UK Biobank RD cases (N = 3 977) and two cohorts, each comprising ~1 000 clinically ascertained rhegmatogenous RD patients, we uncovered 11 genome-wide significant association signals. These are near or within ZC3H11B, BMP3, COL22A1, DLG5, PLCE1, EFEMP2, TYR, FAT3, TRIM29, COL2A1 and LOXL1. Replication in the 23andMe data set, where RD is self-reported by participants, firmly establishes six RD risk loci: FAT3, COL22A1, TYR, BMP3, ZC3H11B and PLCE1. Based on the genetic associations with eye traits described to date, the first two specifically impact risk of a RD, whereas the last four point to shared aetiologies with macular condition, myopia and glaucoma. Fine-mapping prioritized the lead common missense variant (TYR S192Y) as causal variant at the TYR locus and a small set of credible causal variants at the FAT3 locus. The larger study size presented here, enabled by resources linked to health records or self-report, provides novel insights into RD aetiology and underlying pathological pathways.

Download Full-text

The Performance of Conditional Tests for Family Data in Associated Regions Derived from GWAS

Methods of Information in Medicine ◽

10.3414/me09-02-0056 ◽

2010 ◽

Vol 49 (06) ◽

pp. 625-631

Author(s):

H. Schäfer ◽

B. H. Greene

Keyword(s):

Statistical Power ◽

Association Studies ◽

Univariate Analysis ◽

Family Data ◽

Independent Effect ◽

Genome Wide Association Studies ◽

Genome Wide ◽

Single Marker ◽

Conditional Tests ◽

Additional Locus

Summary Background: Genome-wide association studies (GWAS) have been used successfully to identify genetic loci associated with complex diseases and phenotypes. Often this association takes the form of several significant signals (such as small p-values) in a univariate analysis at various markers within a single genetic region. Once confirmed, these associations lead to the question if a single marker tags the association signal of another, functionally relevant variant or if the single marker tags a functionally relevant haplo-type. To deal with this question, methods for family data based on logistic regression, adaptations of the transmission/disequilibrium test (TDT) or weighted haplotype likelihood (WHL) methods have been proposed in the literature. Objectives: Objectives were to examine the effect of parameters such as sample size, inheritance model, and the effects of linkage disequilibrium (LD) in the region on the ability of a selection of methods to detect an independent effect from an additional locus. Methods: All methods tested were applied to simulated genetic data of trios comprising a single affected offspring and two parents. Results: While regression-based methods have advantages such as model flexibility, potentially increasing power, the WHL method was more robust against increasing LD in the scenarios analyzed. Conclusions: Simulation results suggest that the regression and WHL methods are better able with regard to statistical power than the adaptation of the TDT analyzed here to detect genetic effects at an additional locus while controlling for confounding at another locus.

Download Full-text