scholarly journals A gene based approach to test genetic association based on an Optimally Weighted Combination of Multiple Traits

2019 ◽  
Author(s):  
Jianjun Zhang ◽  
Qiuying Sha ◽  
Guanfu Liu ◽  
Xuexia Wang

AbstractThere is increasing evidence showing that pleiotropy is a widespread phenomenon in complex diseases for which multiple correlated traits are often measured. Joint analysis of multiple traits could increase statistical power by aggregating multiple weak effects. Existing methods for multiple trait association tests usually study each of the multiple traits separately and then combine the univariate test statistics or combine p-values of the univariate tests for identifying disease associated genetic variants. However, ignoring correlation between phenotypes may cause power loss. Additionally, the genetic variants in one gene (including common and rare variants) are often viewed as a whole that affects the underlying disease since the basic functional unit of inheritance is a gene rather than a genetic variant. Thus, results from gene level association test can be more readily integrated with downstream functional and pathogenic investigation, whereas many existing methods for multiple trait association tests only focus on testing a single common variant rather than a gene. In this article, we propose a statistical method by Testing an Optimally Weighted Combination of Multiple traits (TOW-CM) to test the association between multiple traits and multiple variants in a genomic region (a gene or pathway). We investigate the performance of the proposed method through extensive simulation studies. Our simulation studies show that the proposed method has correct type I error rates and is either the most powerful test or comparable with the most powerful tests. In addition, we illustrate the usefulness of TOW-CM by analyzing a whole-genome genotyping data from a COPDGene study.

2018 ◽  
Author(s):  
Zhenchuan Wang ◽  
Qiuying Sha ◽  
Kui Zhang ◽  
Shuanglin Zhang

AbstractJoint analysis of multiple traits has recently become popular since it can increase statistical power to detect genetic variants and there is increasing evidence showing that pleiotropy is a widespread phenomenon in complex diseases. Currently, most of existing methods test the association between multiple traits and a single common variant. However, the variant-by-variant methods for common variant association studies may not be optimal for rare variant association studies due to the allelic heterogeneity as well as the extreme rarity of individual variants. In this article, we developed a statistical method by testing an optimally weighted combination of variants with multiple traits (TOWmuT) to test the association between multiple traits and a weighted combination of variants (rare and/or common) in a genomic region. TOWmuT is robust to the directions of effects of causal variants and is applicable to different types of traits. Using extensive simulation studies, we compared the performance of TOWmuT with the following five existing methods: gene association with multiple traits (GAMuT), multiple sequence kernel association test (MSKAT), adaptive weighting reverse regression (AWRR), single-TOW, and MANOVA. Our results showed that, in all of the simulation scenarios, TOWmuT has correct type I error rates and is consistently more powerful than the other five tests. We also illustrated the usefulness of TOWmuT by analyzing a whole-genome genotyping data from a lung function study.


2019 ◽  
Vol 21 (3) ◽  
pp. 753-761 ◽  
Author(s):  
Regina Brinster ◽  
Dominique Scherer ◽  
Justo Lorenzo Bermejo

Abstract Population stratification is usually corrected relying on principal component analysis (PCA) of genome-wide genotype data, even in populations considered genetically homogeneous, such as Europeans. The need to genotype only a small number of genetic variants that show large differences in allele frequency among subpopulations—so-called ancestry-informative markers (AIMs)—instead of the whole genome for stratification adjustment could represent an advantage for replication studies and candidate gene/pathway studies. Here we compare the correction performance of classical and robust principal components (PCs) with the use of AIMs selected according to four different methods: the informativeness for assignment measure ($IN$-AIMs), the combination of PCA and F-statistics, PCA-correlated measurement and the PCA weighted loadings for each genetic variant. We used real genotype data from the Population Reference Sample and The Cancer Genome Atlas to simulate European genetic association studies and to quantify type I error rate and statistical power in different case–control settings. In studies with the same numbers of cases and controls per country and control-to-case ratios reflecting actual rates of disease prevalence, no adjustment for population stratification was required. The unnecessary inclusion of the country of origin, PCs or AIMs as covariates in the regression models translated into increasing type I error rates. In studies with cases and controls from separate countries, no investigated method was able to adequately correct for population stratification. The first classical and the first two robust PCs achieved the lowest (although inflated) type I error, followed at some distance by the first eight $IN$-AIMs.


2018 ◽  
Vol 100 ◽  
Author(s):  
LILI CHEN ◽  
YONG WANG ◽  
YAJING ZHOU

SummaryPleiotropy, the effect of one variant on multiple traits, is widespread in complex diseases. Joint analysis of multiple traits can improve statistical power to detect genetic variants and uncover the underlying genetic mechanism. Currently, a large number of existing methods target one common variant or only rare variants. Increasing evidence shows that complex diseases are caused by common and rare variants. Here we propose a region-based method to test both rare and common variant associated multiple traits based on variable reduction method (abbreviated as MULVR). However, in the presence of noise traits, the MULVR method may lose power, so we propose the MULVR-O method, which jointly analyses the optimal number of traits associated with genetic variants by the MULVR method, to guard against the effect of noise traits. Extensive simulation studies show that our proposed method (MULVR-O) is applied to not only multiple quantitative traits but also qualitative traits, and is more powerful than several other comparison methods in most scenarios. An application to the two genes (SHBG and CHRM3) and two phenotypes (systolic blood pressure and diastolic blood pressure) from the GAW19 dataset illustrates that our proposed methods (MULVR and MULVR-O) are feasible and efficient as a region-based method.


2019 ◽  
Author(s):  
Jianjun Zhang ◽  
Qiuying Sha ◽  
Han Hao ◽  
Shuanglin Zhang ◽  
Xiaoyi Raymond Gao ◽  
...  

AbstractThe risk of many complex diseases is determined by a complex interplay of genetic and environmental factors. Data on multiple traits is often collected for many complex diseases in order to obtain a better understanding of the diseases. Examination of gene-environment interactions (GxEs) for multiple traits can yield valuable insights about the etiology of the disease and increase power in detecting disease associated genes. Most existing methods focus on testing gene-environment interaction (GxE) for a single trait. In this study, we develop novel approaches to test GxEs for multiple traits in sequencing association studies. We first perform transformation of multiple traits by using either principle component analysis or standardization analysis. Then, we detect the effect of GxE for each transferred phenotypic trait using novel proposed tests: testing the effect of an optimallyweighted combination of GxE (TOW-GE) and/or variable weight TOW-GE (VW-TOW-GE). Finally, we employ the Fisher’s combination test to combine the p-values of TOW-GE and/or VW-TOW-GE. Extensive simulation studies based on the Genetic Analysis Workshop 17 data show that the type I error rates of the proposed methods are well controlled. Compared to the existing interaction sequence kernel association test (ISKAT), TOW-GE is more powerful when there are only rare risk and protective variants; VW-TOW-GE is more powerful when there are both rare and common risk and protective variants. Both TOW-GE and VW-TOW-GE are robust to directions of effects of causal GxEs. Application to the COPDGene Study demonstrates that our proposed methods are very powerful.


2021 ◽  
pp. 096228022110028
Author(s):  
Zhen Meng ◽  
Qinglong Yang ◽  
Qizhai Li ◽  
Baoxue Zhang

For a nonparametric Behrens-Fisher problem, a directional-sum test is proposed based on division-combination strategy. A one-layer wild bootstrap procedure is given to calculate its statistical significance. We conduct simulation studies with data generated from lognormal, t and Laplace distributions to show that the proposed test can control the type I error rates properly and is more powerful than the existing rank-sum and maximum-type tests under most of the considered scenarios. Applications to the dietary intervention trial further show the performance of the proposed test.


BMC Genetics ◽  
2020 ◽  
Vol 21 (1) ◽  
Author(s):  
Guoqing Diao ◽  
Dan-yu Lin

Abstract Background Associations between haplotypes and quantitative traits provide valuable information about the genetic basis of complex human diseases. Haplotypes also provide an effective way to deal with untyped SNPs. Two major challenges arise in haplotype-based association analysis of family data. First, haplotypes may not be inferred with certainty from genotype data. Second, the trait values within a family tend to be correlated because of common genetic and environmental factors. Results To address these challenges, we present an efficient likelihood-based approach to analyzing associations of quantitative traits with haplotypes or untyped SNPs. This approach properly accounts for within-family trait correlations and can handle general pedigrees with arbitrary patterns of missing genotypes. We characterize the genetic effects on the quantitative trait by a linear regression model with random effects and develop efficient likelihood-based inference procedures. Extensive simulation studies are conducted to examine the performance of the proposed methods. An application to family data from the Childhood Asthma Management Program Ancillary Genetic Study is provided. A computer program is freely available. Conclusions Results from extensive simulation studies show that the proposed methods for testing the haplotype effects on quantitative traits have correct type I error rates and are more powerful than some existing methods.


2021 ◽  
Author(s):  
Xuewei Cao ◽  
Xuexia Wang ◽  
Shuanglin Zhang ◽  
Qiuying Sha

Abstract Although genome-wide association studies (GWAS) have been successfully applied to a variety of complex diseases and identified many genetic variants underlying complex diseases, there is still a considerable heritability of complex diseases that could not be explained by GWAS. One alternative approach to overcome the missing heritability caused by the genetic heterogeneity is gene-based analysis, which considers the aggregate effects of multiple genetic variants in a single test. Another alternative approach is transcriptome-wide association study (TWAS). TWAS aggregates genomic information into functionally relevant units that map to genes and their expression. TWAS is not only powerful, but can also increase the interpretability in biological mechanisms of identified trait associated genes. In this study, we propose two powerful and computationally efficient gene-based association tests, Overall and Copula. These two tests aggregate information from three traditional types of gene-based association tests and also incorporate expression quantitative trait locus (eQTL) data into GWAS using GWAS summary statistics. Overall utilizes the extended Simes procedure and Copula utilizes the Gaussian copula approximation-based method. We show that after a small number of replications to estimate the correlation among the integrated gene-based tests, the P values of these two methods can be calculated analytically. Simulation studies show that these two tests can control type I error rate very well and have higher power than the tests that we compared. We also apply these two methods to two schizophrenia GWAS summary datasets and two lipids GWAS summary datasets. The results show that these two newly developed methods can identify more significant genes than other methods we compared with.


2019 ◽  
Author(s):  
Chong Wu

AbstractMany genetic variants identified in genome-wide association studies (GWAS) are associated with multiple, sometimes seemingly unrelated traits. This motivates multi-trait association analyses, which have successfully identified novel associated loci for many complex diseases. While appealing, most existing methods focus on analyzing a relatively small number of traits and may yield inflated Type I error rates when a large number of traits need to be analyzed jointly. As deep phenotyping data are becoming rapidly available, we develop a novel method, referred to as aMAT (adaptive multi-trait association test), for multi-trait analysis of any number of traits. We applied aMAT to GWAS summary statistics for a set of 58 volumetric imaging derived phenotypes from the UK Biobank. aMAT had a genomic inflation factor of 1.04, indicating the Type I error rates were well controlled. More important, aMAT identified 24 distinct risk loci, 13 of which were ignored by standard GWAS. In comparison, the competing methods either had a suspicious genomic inflation factor or identified much fewer risk loci. Finally, four additional sets of traits have been analyzed and provided similar conclusions.


2018 ◽  
Vol 35 (13) ◽  
pp. 2251-2257 ◽  
Author(s):  
Bin Guo ◽  
Baolin Wu

Abstract Motivation Genetics hold great promise to precision medicine by tailoring treatment to the individual patient based on their genetic profiles. Toward this goal, many large-scale genome-wide association studies (GWAS) have been performed in the last decade to identify genetic variants associated with various traits and diseases. They have successfully identified tens of thousands of disease-related variants. However they have explained only a small proportion of the overall trait heritability for most traits and are of very limited clinical use. This is partly owing to the small effect sizes of most genetic variants, and the common practice of testing association between one trait and one genetic variant at a time in most GWAS, even when multiple related traits are often measured for each individual. Increasing evidence suggests that many genetic variants can influence multiple traits simultaneously, and we can gain more power by testing association of multiple traits simultaneously. It is appealing to develop novel multi-trait association test methods that need only GWAS summary data, since it is generally very hard to access the individual-level GWAS phenotype and genotype data. Results Many existing GWAS summary data-based association test methods have relied on ad hoc approach or crude Monte Carlo approximation. In this article, we develop rigorous statistical methods for efficient and powerful multi-trait association test. We develop robust and efficient methods to accurately estimate the marginal trait correlation matrix using only GWAS summary data. We construct the principal component (PC)-based association test from the summary statistics. PC-based test has optimal power when the underlying multi-trait signal can be captured by the first PC, and otherwise it will have suboptimal performance. We develop an adaptive test by optimally weighting the PC-based test and the omnibus chi-square test to achieve robust performance under various scenarios. We develop efficient numerical algorithms to compute the analytical P-values for all the proposed tests without the need of Monte Carlo sampling. We illustrate the utility of proposed methods through application to the GWAS meta-analysis summary data for multiple lipids and glycemic traits. We identify multiple novel loci that were missed by individual trait-based association test. Availability and implementation All the proposed methods are implemented in an R package available at http://www.github.com/baolinwu/MTAR. The developed R programs are extremely efficient: it takes less than 2 min to compute the list of genome-wide significant single nucleotide polymorphisms (SNPs) for all proposed multi-trait tests for the lipids GWAS summary data with 2.5 million SNPs on a single Linux desktop. Supplementary information Supplementary data are available at Bioinformatics online.


2021 ◽  
Vol 12 ◽  
Author(s):  
Liwan Fu ◽  
Yuquan Wang ◽  
Tingting Li ◽  
Yue-Qing Hu

As a pivotal research tool, genome-wide association study has successfully identified numerous genetic variants underlying distinct diseases. However, these identified genetic variants only explain a small proportion of the phenotypic variation for certain diseases, suggesting that there are still more genetic signals to be detected. One of the reasons may be that one-phenotype one-variant association study is not so efficient in detecting variants of weak effects. Nowadays, it is increasingly worth noting that joint analysis of multiple phenotypes may boost the statistical power to detect pathogenic variants with weak genetic effects on complex diseases, providing more clues for their underlying biology mechanisms. So a Weighted Combination of multiple phenotypes following Hierarchical Clustering method (WCHC) is proposed for simultaneously analyzing multiple phenotypes in association studies. A series of simulations are conducted, and the results show that WCHC is either the most powerful method or comparable with the most powerful competitor in most of the simulation scenarios. Additionally, we evaluated the performance of WCHC in its application to the obesity-related phenotypes from Atherosclerosis Risk in Communities, and several associated variants are reported.


Sign in / Sign up

Export Citation Format

Share Document