scholarly journals Multiplex Confounding Factor Correction for Genomic Association Mapping with Squared Sparse Linear Mixed Model

2017 ◽  
Author(s):  
Haohan Wang ◽  
Xiang Liu ◽  
Yunpeng Xiao ◽  
Ming Xu ◽  
Eric P. Xing

AbstractGenome-wide Association Study has presented a promising way to understand the association between human genomes and complex traits. Many simple polymorphic loci have been shown to explain a significant fraction of phenotypic variability. However, challenges remain in the non-triviality of explaining complex traits associated with multifactorial genetic loci, especially considering the confounding factors caused by population structure, family structure, and cryptic relatedness. In this paper, we propose a Squared-LMM (LMM2) model, aiming to jointly correct population and genetic confounding factors. We offer two strategies of utilizing LMM2 for association mapping: 1) It serves as an extension of univariate LMM, which could effectively correct population structure, but consider each SNP in isolation. 2) It is integrated with the multivariate regression model to discover association relationship between complex traits and multifactorial genetic loci. We refer to this second model as sparse Squared-LMM (sLMM2). Further, we extend LMM2/sLMM2 by raising the power of our squared model to the LMMn/sLMMn model. We demonstrate the practical use of our model with synthetic phenotypic variants generated from genetic loci of Arabidopsis Thaliana. The experiment shows that our method achieves a more accurate and significant prediction on the association relationship between traits and loci. We also evaluate our models on collected phenotypes and genotypes with the number of candidate genes that the models could discover. The results suggest the potential and promising usage of our method in genome-wide association studies.

2018 ◽  
Author(s):  
Ping Zeng ◽  
Xinjie Hao ◽  
Xiang Zhou

AbstractMotivationGenome-wide association studies (GWASs) have identified many genetic loci associated with complex traits. A substantial fraction of these identified loci are associated with multiple traits – a phenomena known as pleiotropy. Identification of pleiotropic associations can help characterize the genetic relationship among complex traits and can facilitate our understanding of disease etiology. Effective pleiotropic association mapping requires the development of statistical methods that can jointly model multiple traits with genome-wide SNPs together.ResultsWe develop a joint modeling method, which we refer to as the integrative MApping of Pleiotropic association (iMAP). iMAP models summary statistics from GWASs, uses a multivariate Gaussian distribution to account for phenotypic correlation, simultaneously infers genome-wide SNP association pattern using mixture modeling, and has the potential to reveal causal relationship between traits. Importantly, iMAP integrates a large number of SNP functional annotations to substantially improve association mapping power, and, with a sparsity-inducing penalty, is capable of selecting informative annotations from a large, potentially noninformative set. To enable scalable inference of iMAP to association studies with hundreds of thousands of individuals and millions of SNPs, we develop an efficient expectation maximization algorithm based on an approximate penalized regression algorithm. With simulations and comparisons to existing methods, we illustrate the benefits of iMAP both in terms of high association mapping power and in terms of accurate estimation of genome-wide SNP association patterns. Finally, we apply iMAP to perform a joint analysis of 48 traits from 31 GWAS consortia together with 40 tissue-specific SNP annotations generated from the Roadmap Project. iMAP is freely available at www.xzlab.org/software.html.


2020 ◽  
Vol 15 (11) ◽  
pp. 1643-1656
Author(s):  
Adrienne Tin ◽  
Anna Köttgen

The past few years have seen major advances in genome-wide association studies (GWAS) of CKD and kidney function–related traits in several areas: increases in sample size from >100,000 to >1 million, enabling the discovery of >250 associated genetic loci that are highly reproducible; the inclusion of participants not only of European but also of non-European ancestries; and the use of advanced computational methods to integrate additional genomic and other unbiased, high-dimensional data to characterize the underlying genetic architecture and prioritize potentially causal genes and variants. Together with other large-scale biobank and genetic association studies of complex traits, these GWAS of kidney function–related traits have also provided novel insight into the relationship of kidney function to other diseases with respect to their genetic associations, genetic correlation, and directional relationships. A number of studies also included functional experiments using model organisms or cell lines to validate prioritized potentially causal genes and/or variants. In this review article, we will summarize these recent GWAS of CKD and kidney function–related traits, explain approaches for downstream characterization of associated genetic loci and the value of such computational follow-up analyses, and discuss related challenges along with potential solutions to ultimately enable improved treatment and prevention of kidney diseases through genetics.


2019 ◽  
Author(s):  
Antoine R. Baldassari ◽  
Colleen M. Sitlani ◽  
Heather M. Highland ◽  
Dan E. Arking ◽  
Steve Buyske ◽  
...  

ABSTRACTBackgroundPublished genome-wide association studies (GWAS) are mainly European-centric, examine a narrow view of phenotypic variation, and infrequently interrogate genetic effects shared across traits. We therefore examined the extent to which a multi-ethnic, combined trait GWAS of phenotypes that map to well-defined biology can enable detection and characterization of complex trait loci.MethodsWith 1000 Genomes Phase 3 imputed data in 34,668 participants (15% African American; 3% Chinese American; 51% European American; 30% Hispanic/Latino), we performed covariate-adjusted univariate GWAS of six contiguous electrocardiogram (ECG) traits that decomposed an average heartbeat and two commonly reported composite ECG traits that summed contiguous traits. Combined phenotype testing was performed using the adaptive sum of powered scores test (aSPU).ResultsWe identified six novel and 87 known ECG trait loci (aSPU p-value < 5E-9). Lead SNP rs3211938 at novel locus CD36 was common in African Americans (minor allele frequency=10%) and near-monomorphic in European Americans, with effect sizes for the composite trait, QT interval, among the largest reported. Only one novel locus was detected for the composite traits, due to opposite directions of effects across contiguous traits that summed to near-zero. Combined phenotype testing did not detect novel loci unapparent by univariate testing. However, this approach aided locus characterization, particularly when loci harbored multiple independent signals that differed by trait.ConclusionsDespite including one-third as few participants as the largest published GWAS of ECG traits, our study identifies multiple novel ECG genetic loci, emphasizing the importance of ancestral diversity and phenotype measurement in this era of ever-growing GWAS.AUTHOR SUMMARYWe leveraged a multiethnic cohort with precise measures of cardioelectric function to identify novel genetic loci affecting this complex, multifaceted phenotype. The success of our approach stresses the importance of phenotypic precision and participant diversity for future locus discovery and characterization efforts, and cautions against compromises made in genome-wide association studies to pursue ever-growing sample sizes.


Author(s):  
Meng Luo ◽  
Shiliang Gu

AbstractDuring the past decades, genome-wide association studies (GWAS) have been used to successfully identify tens of thousands of genetic variants associated with complex traits included in humans, animals, and plants. All common genome-wide association (GWA) methods rely on population structure correction to avoid false genotype and phenotype associations. However, population structure correction is a stringent penalization, which also impedes the identification of real associations. Here, we used recent statistical advances and proposed iterative screen regression (ISR), which enables simultaneous multiple marker associations and shown to appropriately correction population stratification and cryptic relatedness in GWAS. Results from analyses of simulated suggest that the proposed ISR method performed well in terms of power (sensitivity) versus FDR (False Discovery Rate) and specificity, also less bias (higher accuracy) in effect (PVE) estimation than the existing multi-loci (mixed) model and the single-locus (mixed) model. We also show the practicality of our approach by applying it to rice, outbred mice, and A.thaliana datasets. It identified several new causal loci that other methods did not detect. Our ISR provides an alternative for multi-loci GWAS, and the implementation was computationally efficient, analyzing large datasets practicable (n>100,000).


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Daniel L. McCartney ◽  
Josine L. Min ◽  
Rebecca C. Richmond ◽  
Ake T. Lu ◽  
Maria K. Sobczyk ◽  
...  

Abstract Background Biological aging estimators derived from DNA methylation data are heritable and correlate with morbidity and mortality. Consequently, identification of genetic and environmental contributors to the variation in these measures in populations has become a major goal in the field. Results Leveraging DNA methylation and SNP data from more than 40,000 individuals, we identify 137 genome-wide significant loci, of which 113 are novel, from genome-wide association study (GWAS) meta-analyses of four epigenetic clocks and epigenetic surrogate markers for granulocyte proportions and plasminogen activator inhibitor 1 levels, respectively. We find evidence for shared genetic loci associated with the Horvath clock and expression of transcripts encoding genes linked to lipid metabolism and immune function. Notably, these loci are independent of those reported to regulate DNA methylation levels at constituent clock CpGs. A polygenic score for GrimAge acceleration showed strong associations with adiposity-related traits, educational attainment, parental longevity, and C-reactive protein levels. Conclusion This study illuminates the genetic architecture underlying epigenetic aging and its shared genetic contributions with lifestyle factors and longevity.


2021 ◽  
Vol 42 (1) ◽  
Author(s):  
Dinesh K. Saini ◽  
Yuvraj Chopra ◽  
Jagmohan Singh ◽  
Karansher S. Sandhu ◽  
Anand Kumar ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document