Population genetic simulation study of power in association testing across genetic architectures and study designs

Mapping Intimacies ◽

10.1101/632786 ◽

2019 ◽

Author(s):

Dominic Ming Hay Tong ◽

Ryan D. Hernandez

Keyword(s):

Complex Traits ◽

Statistical Power ◽

Rare Variants ◽

Association Studies ◽

Case Control ◽

Population Variation ◽

Control Panel ◽

Evolutionary Forces ◽

Association Testing ◽

Genetic Simulation

AbstractWhile it is well established that genetics can be a major contributor to population variation of complex traits, the relative contributions of rare and common variants to phenotypic variation remains a matter of considerable debate. Here, we simulate rare variant association studies across different case/control panel sampling strategies, sequencing methods, and genetic architecture models based on evolutionary forces to determine the statistical performance of RVATs widely in use. We find that the highest statistical power of RVATs is achieved by sampling case/control individuals from the extremes of an underlying quantitative trait distribution. We also demonstrate that the use of genotyping arrays, in conjunction with imputation from a whole genome sequenced (WGS) reference panel, recovers the vast majority (90%) of the power that could be achieved by sequencing the case/control panel using current tools. Finally, we show that for dichotomous traits, the statistical performance of RVATs decreases as rare variants become more important in the trait architecture. Our results extend previous work to show that RVATs are insufficiently powered to make generalizable conclusions about the role of rare variants in dichotomous complex traits.

Download Full-text

Genome-wide Marginal Epistatic Association Mapping in Case-Control Studies

10.1101/374983 ◽

2018 ◽

Cited By ~ 1

Author(s):

Lorin Crawford ◽

Xiang Zhou

Keyword(s):

Complex Traits ◽

Statistical Power ◽

Association Studies ◽

Computational Cost ◽

Case Control ◽

Type I ◽

Genome Wide Association Studies ◽

Case Control Studies ◽

Control Data ◽

Genome Wide

AbstractEpistasis, commonly defined as the interaction between genetic loci, is an important contributor to the genetic architecture underlying many complex traits and common diseases. Most existing epistatic mapping methods in genome-wide association studies explicitly search over all pairwise or higher-order interactions. However, due to the potentially large search space and the resulting multiple testing burden, these conventional approaches often suffer from heavy computational cost and low statistical power. A recently proposed attractive alternative for mapping epistasis focuses instead on detecting marginal epistasis, which is defined as the combined pairwise interaction effects between a given variant and all other variants. By searching for marginal epistatic effects, one can identify genetic variants that are involved in epistasis without the need to identify the exact partners with which the variants interact — thus, potentially alleviating much of the statistical and computational burden associated with conventional epistatic mapping procedures. However, previous marginal epistatic mapping methods are based on quantitative trait models. As we will show here, these lack statistical power in case-control studies. Here, we develop a liability threshold mixed model that extends marginal epistatic mapping to case-control studies. Our method properly accounts for case-control ascertainment and the binary nature of case-control data. We refer to this method as the liability threshold marginal epistasis test (LT-MAPIT). With simulations, we illustrate the benefits of LT-MAPIT in terms of providing effective type I error control, and being more powerful than both existing marginal epistatic mapping methods and conventional explicit search-based approaches in case-control data. We finally apply LT-MAPIT to identify both marginal and pairwise epistasis in seven complex diseases from the Wellcome Trust Case Control Consortium (WTCCC) 1 study.

Download Full-text

A nonparametric test for association with multiple loci in the retrospective case-control study

Statistical Methods in Medical Research ◽

10.1177/0962280219842892 ◽

2019 ◽

Vol 29 (2) ◽

pp. 589-602

Author(s):

Chan Wang ◽

Shufang Deng ◽

Leiming Sun ◽

Liming Li ◽

Yue-Qing Hu

Keyword(s):

Rare Variants ◽

Association Studies ◽

Nonparametric Test ◽

Case Control ◽

Genome Wide Association Studies ◽

Nucleotide Polymorphisms ◽

Retrospective Case ◽

Multiple Loci ◽

Common Diseases ◽

The Difference

The genome-wide association studies aim at identifying common or rare variants associated with common diseases and explaining more heritability. It is well known that common diseases are influenced by multiple single nucleotide polymorphisms (SNPs) that are usually correlated in location or function. In order to powerfully detect association signals, it is highly desirable to take account of correlations or linkage disequilibrium (LD) information among multiple SNPs in testing for association. In this article, we propose a test SLIDE that depicts the difference of the average multi-locus genotypes between cases and controls and derive its variance–covariance matrix in the retrospective design. This matrix is composed of the pairwise LD between SNPs. Thus SLIDE can borrow the strength from an external database in the population of interest with a few thousands to hundreds of thousands individuals to improve the power for detecting association. Extensive simulations show that SLIDE has apparent superiority over the existing methods, especially in the situation involving both common and rare variants, both protective and deleterious variants. Furthermore, the efficiency of the proposed method is demonstrated in the application to the data from the Wellcome Trust Case Control Consortium.

Download Full-text

Powerful Tukey's One Degree-of-Freedom Test for Detecting Gene-Gene and Gene-Environment Interactions

Cancer Informatics ◽

10.4137/cin.s17305 ◽

2015 ◽

Vol 14s2 ◽

pp. CIN.S17305 ◽

Cited By ~ 1

Author(s):

Yaping Wang ◽

Donghui Li ◽

Peng Wei

Keyword(s):

Statistical Power ◽

Association Studies ◽

Score Test ◽

Principal Component ◽

Case Control ◽

Degree Of Freedom ◽

Genome Wide Association Studies ◽

Nucleotide Polymorphisms ◽

Missing Heritability ◽

Gene Environment

Genome-wide association studies (GWASs) have identified thousands of single nucleotide polymorphisms (SNPs) robustly associated with hundreds of complex human diseases including cancers. However, the large number of G WAS-identified genetic loci only explains a small proportion of the disease heritability. This “missing heritability” problem has been partly attributed to the yet-to-be-identified gene-gene (G × G) and gene-environment (G × E) interactions. In spite of the important roles of G × G and G × E interactions in understanding disease mechanisms and filling in the missing heritability, straightforward GWAS scanning for such interactions has very limited statistical power, leading to few successes. Here we propose a two-step statistical approach to test G × G/G × E interactions: the first step is to perform principal component analysis (PCA) on the multiple SNPs within a gene region, and the second step is to perform Tukey's one degree-of-freedom (1-df) test on the leading PCs. We derive a score test that is computationally fast and numerically stable for the proposed Tukey's 1-df interaction test. Using extensive simulations we show that the proposed approach, which combines the two parsimonious models, namely, the PCA and Tukey's 1-df form of interaction, outperforms other state-of-the-art methods. We also demonstrate the utility and efficiency gains of the proposed method with applications to testing G × G interactions for Crohn's disease using the Wellcome Trust Case Control Consortium (WTCCC) GWAS data and testing G × E interaction using data from a case-control study of pancreatic cancer.

Download Full-text

Gamete simulation improves polygenic transmission disequilibrium analysis

10.1101/2020.10.26.355602 ◽

2020 ◽

Author(s):

Jiawen Chen ◽

Jing You ◽

Zijie Zhao ◽

Zheng Ni ◽

Kunling Huang ◽

...

Keyword(s):

Complex Traits ◽

Statistical Power ◽

Association Studies ◽

Autism Spectrum ◽

Genetic Maps ◽

Risk Scores ◽

Parental Genotype ◽

Genome Wide Association Studies ◽

Transmission Disequilibrium ◽

Polygenic Risk

AbstractPolygenic risk scores (PRS) derived from summary statistics of genome-wide association studies (GWAS) have enjoyed great popularity in human genetics research. Applied to population cohorts, PRS can effectively stratify individuals by risk group and has promising applications in early diagnosis and clinical intervention. However, our understanding of within-family polygenic risk is incomplete, in part because the small samples per family significantly limits power. Here, to address this challenge, we introduce ORIGAMI, a computational framework that uses parental genotype data to simulate offspring genomes. ORIGAMI uses state-of-the-art genetic maps to simulate realistic recombination events on phased parental genomes and allows quantifying the prospective PRS variability within each family. We quantify and showcase the substantially reduced yet highly heterogeneous PRS variation within families for numerous complex traits. Further, we incorporate within-family PRS variability to improve polygenic transmission disequilibrium test (pTDT). Through simulations, we demonstrate that modeling within-family risk substantially improves the statistical power of pTDT. Applied to 7,805 trios of autism spectrum disorder (ASD) probands and healthy parents, we successfully replicated previously reported over-transmission of ASD, educational attainment, and schizophrenia risk, and identified multiple novel traits with significant transmission disequilibrium. These results provided novel etiologic insights into the shared genetic basis of various complex traits and ASD.

Download Full-text

Across-cohort QC analyses of genome-wide association study summary statistics from complex traits

10.1101/033787 ◽

2015 ◽

Author(s):

Guo-Bo Chen ◽

Sang Hong Lee ◽

Matthew R Robinson ◽

Maciej Trzaskowski ◽

Zhi-Xiang Zhu ◽

...

Keyword(s):

Complex Traits ◽

Statistical Power ◽

Association Studies ◽

False Negative ◽

Genome Wide Association ◽

Effect Sizes ◽

Genome Wide Association Studies ◽

Summary Statistics ◽

Unknown Sample ◽

Genome Wide

Genome-wide association studies (GWASs) have been successful in discovering replicable SNP-trait associations for many quantitative traits and common diseases in humans. Typically the effect sizes of SNP alleles are very small and this has led to large genome-wide association meta-analyses (GWAMA) to maximize statistical power. A trend towards ever-larger GWAMA is likely to continue, yet dealing with summary statistics from hundreds of cohorts increases logistical and quality control problems, including unknown sample overlap, and these can lead to both false positive and false negative findings. In this study we propose a new set of metrics and visualization tools for GWAMA, using summary statistics from cohort-level GWASs. We proposed a pair of methods in examining the concordance between demographic information and summary statistics. In method I, we use the population genetics Fststatistic to verify the genetic origin of each cohort and their geographic location, and demonstrate using GWAMA data from the GIANT Consortium that geographic locations of cohorts can be recovered and outlier cohorts can be detected. In method II, we conduct principal component analysis based on reported allele frequencies, and is able to recover the ancestral information for each cohort. In addition, we propose a new statistic that uses the reported allelic effect sizes and their standard errors to identify significant sample overlap or heterogeneity between pairs of cohorts. Finally, to quantify unknown sample overlap across all pairs of cohorts we propose a method that uses randomly generated genetic predictors that does not require the sharing of individual-level genotype data and does not breach individual privacy.

Download Full-text

Case-control association mapping without cases

10.1101/045831 ◽

2016 ◽

Cited By ~ 2

Author(s):

Jimmy Z Liu ◽

Yaniv Erlich ◽

Joseph K Pickrell

Keyword(s):

Association Mapping ◽

Complex Traits ◽

Disease Risk ◽

Association Studies ◽

Meta Analysis ◽

Large Population ◽

Case Control ◽

Genome Wide Association Studies ◽

The Uk ◽

Control Association

AbstractThe case-control association study is a powerful method for identifying genetic variants that influence disease risk. However, the collection of cases can be time-consuming and expensive; if a disease occurs late in life or is rapidly lethal, it may be more practical to identify family members of cases. Here, we show that replacing cases with their first-degree relatives enables genome-wide association studies by proxy (GWAX). In randomly-ascertained cohorts, this approach enables previously infeasible studies of diseases that are absent (or nearly absent) in the cohort. As an illustration, we performed GWAX of 12 common diseases in 116,196 individuals from the UK Biobank. By combining these results with published GWAS summary statistics in a meta-analysis, we replicated established risk loci and identified 17 newly associated risk loci: four in Alzheimer’s disease, eight in coronary artery disease, and five in type 2 diabetes. In addition to informing disease biology, our results demonstrate the utility of association mapping using family history of disease as a phenotype to be mapped. We anticipate that this approach will prove useful in future genetic studies of complex traits in large population cohorts.

Download Full-text

The Multivariate Normal Distribution Framework for Analyzing Association Studies

10.1101/208199 ◽

2017 ◽

Author(s):

Jose A. Lozano ◽

Farhad Hormozdiari ◽

Jong Wha (Joanne) Joo ◽

Buhm Han ◽

Eleazar Eskin

Keyword(s):

Genetic Variants ◽

Statistical Power ◽

Association Studies ◽

Multiple Hypothesis Testing ◽

Population History ◽

Genome Wide Association Studies ◽

Multivariate Normal ◽

Association Testing ◽

Genome Wide ◽

Comprehensive Framework

AbstractGenome-wide association studies (GWAS) have discovered thousands of variants involved in common human diseases. In these studies, frequencies of genetic variants are compared between a cohort of individuals with a disease (cases) and a cohort of healthy individuals (controls). Any variant that has a significantly different frequency between the two cohorts is considered an associated variant. A challenge in the analysis of GWAS studies is the fact that human population history causes nearby genetic variants in the genome to be correlated with each other. In this review, we demonstrate how to utilize the multivariate normal (MVN) distribution to explicitly take into account the correlation between genetic variants in a comprehensive framework for analysis of GWAS. We show how the MVN framework can be applied to perform association testing, correct for multiple hypothesis testing, estimate statistical power, and perform fine mapping and imputation.

Download Full-text

An atlas of genetic associations in UK Biobank

10.1101/176834 ◽

2017 ◽

Cited By ~ 18

Author(s):

Oriol Canela-Xandri ◽

Konrad Rawlik ◽

Albert Tenesa

Keyword(s):

Complex Traits ◽

Statistical Power ◽

Association Studies ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Uk Biobank ◽

Genetic Associations ◽

Genome Wide ◽

Related Individuals ◽

Sufficient Statistical Power

ABSTRACTGenome-wide association studies have revealed many loci contributing to the variation of complex traits, yet the majority of loci that contribute to the heritability of complex traits remain elusive. Large study populations with sufficient statistical power are required to detect the small effect sizes of the yet unidentified genetic variants. However, the analysis of huge cohorts, like UK Biobank, is complicated by incidental structure present when collecting such large cohorts. For instance, UK Biobank comprises 107,162 third degree or closer related participants. Traditionally, GWAS have removed related individuals because they comprised an insignificant proportion of the overall sample size, however, removing related individuals in UK Biobank would entail a substantial loss of power. Furthermore, modelling such structure using linear mixed models is computationally expensive, which requires a computational infrastructure that may not be accessible to all researchers. Here we present an atlas of genetic associations for 118 non-binary and 599 binary traits of 408,455 related and unrelated UK Biobank participants of White-British descent. Results are compiled in a publicly accessible database that allows querying genome-wide association summary results for 623,944 genotyped and HapMap2 imputed SNPs, as well downloading whole GWAS summary statistics for over 30 million imputed SNPs from the Haplotype Reference Consortium panel. Our atlas of associations (GeneATLAS,http://geneatlas.roslin.ed.ac.uk) will help researchers to query UK Biobank results in an easy way without the need to incur in high computational costs.

Download Full-text

A Novel Approach for the Simultaneous Analysis of Common and Rare Variants in Complex Traits

Bioinformatics and Biology Insights ◽

10.4137/bbi.s8852 ◽

2012 ◽

Vol 6 ◽

pp. BBI.S8852 ◽

Cited By ~ 4

Author(s):

Ao Yuan ◽

Guanjie Chen ◽

Yanxun Zhou ◽

Amy Bentley ◽

Charles Rotimi

Keyword(s):

Complex Traits ◽

Rare Variants ◽

Sequence Data ◽

Association Studies ◽

Simultaneous Analysis ◽

Genome Wide Association Studies ◽

Common Variants ◽

Disease Etiology ◽

Novel Approach ◽

Common Genetic Variants

Genome-wide association studies (GWAS) have been successful in detecting common genetic variants underlying common traits and diseases. Despite the GWAS success stories, the percent trait variance explained by GWAS signals, the so called “missing heritability” has been, at best, modest. Also, the predictive power of common variants identified by GWAS has not been encouraging. Given these observations along with the fact that the effects of rare variants are often, by design, unaccounted for by GWAS and the availability of sequence data, there is a growing need for robust analytic approaches to evaluate the contribution of rare variants to common complex diseases. Here we propose a new method that enables the simultaneous analysis of the association between rare and common variants in disease etiology. We refer to this method as SCARVA (simultaneous common and rare variants analysis). SCARVA is simple to use and is efficient. We used SCARVA to analyze two independent real datasets to identify rare and common variants underlying variation in obesity among participants in the Africa America Diabetes Mellitus (AADM) study and plasma triglyceride levels in the Dallas Heart Study (DHS). We found common and rare variants associated with both traits, consistent with published results.

Download Full-text

Rare variants regulate expression of nearby individual genes in multiple tissues

PLoS Genetics ◽

10.1371/journal.pgen.1009596 ◽

2021 ◽

Vol 17 (6) ◽

pp. e1009596

Author(s):

Jiajin Li ◽

Nahyun Kong ◽

Buhm Han ◽

Jae Hoon Sul

Keyword(s):

Gene Expression ◽

Complex Traits ◽

Statistical Power ◽

Rare Variants ◽

Small Sample Size ◽

Tissue Expression ◽

Small Sample ◽

Ratio Test ◽

Common Variants ◽

Regulatory Effects

The rapid decrease in sequencing cost has enabled genetic studies to discover rare variants associated with complex diseases and traits. Once this association is identified, the next step is to understand the genetic mechanism of rare variants on how the variants influence diseases. Similar to the hypothesis of common variants, rare variants may affect diseases by regulating gene expression, and recently, several studies have identified the effects of rare variants on gene expression using heritability and expression outlier analyses. However, identifying individual genes whose expression is regulated by rare variants has been challenging due to the relatively small sample size of expression quantitative trait loci studies and statistical approaches not optimized to detect the effects of rare variants. In this study, we analyze whole-genome sequencing and RNA-seq data of 681 European individuals collected for the Genotype-Tissue Expression (GTEx) project (v8) to identify individual genes in 49 human tissues whose expression is regulated by rare variants. To improve statistical power, we develop an approach based on a likelihood ratio test that combines effects of multiple rare variants in a nonlinear manner and has higher power than previous approaches. Using GTEx data, we identify many genes regulated by rare variants, and some of them are only regulated by rare variants and not by common variants. We also find that genes regulated by rare variants are enriched for expression outliers and disease-causing genes. These results suggest the regulatory effects of rare variants, which would be important in interpreting associations of rare variants with complex traits.

Download Full-text