Penalized regression and risk prediction in genome-wide association studies

Erin Austin; Wei Pan; Xiaotong Shen

doi:10.1002/sam.11183

Genetic markers of type 2 diabetes: Progress in genome-wide association studies and clinical application for risk prediction

Journal of Diabetes ◽

10.1111/1753-0407.12323 ◽

2015 ◽

Vol 8 (1) ◽

pp. 24-35 ◽

Cited By ~ 44

Author(s):

Xueyin Wang ◽

Garrett Strizich ◽

Yonghua Hu ◽

Tao Wang ◽

Robert C. Kaplan ◽

...

Keyword(s):

Type 2 Diabetes ◽

Risk Prediction ◽

Genetic Markers ◽

Clinical Application ◽

Association Studies ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Genome Wide

Download Full-text

Risk prediction using genome-wide association studies

Genetic Epidemiology ◽

10.1002/gepi.20509 ◽

2010 ◽

Vol 34 (7) ◽

pp. 643-652 ◽

Cited By ~ 85

Author(s):

Charles Kooperberg ◽

Michael LeBlanc ◽

Valerie Obenchain

Keyword(s):

Risk Prediction ◽

Association Studies ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Genome Wide

Download Full-text

Pleiotropic Mapping and Annotation Selection in Genome-wide Association Studies with Penalized Gaussian Mixture Models

10.1101/256461 ◽

2018 ◽

Author(s):

Ping Zeng ◽

Xinjie Hao ◽

Xiang Zhou

Keyword(s):

Association Mapping ◽

Complex Traits ◽

Association Studies ◽

Penalized Regression ◽

Genome Wide Association ◽

Accurate Estimation ◽

Genome Wide Association Studies ◽

Multiple Traits ◽

Snp Association ◽

Genome Wide

AbstractMotivationGenome-wide association studies (GWASs) have identified many genetic loci associated with complex traits. A substantial fraction of these identified loci are associated with multiple traits – a phenomena known as pleiotropy. Identification of pleiotropic associations can help characterize the genetic relationship among complex traits and can facilitate our understanding of disease etiology. Effective pleiotropic association mapping requires the development of statistical methods that can jointly model multiple traits with genome-wide SNPs together.ResultsWe develop a joint modeling method, which we refer to as the integrative MApping of Pleiotropic association (iMAP). iMAP models summary statistics from GWASs, uses a multivariate Gaussian distribution to account for phenotypic correlation, simultaneously infers genome-wide SNP association pattern using mixture modeling, and has the potential to reveal causal relationship between traits. Importantly, iMAP integrates a large number of SNP functional annotations to substantially improve association mapping power, and, with a sparsity-inducing penalty, is capable of selecting informative annotations from a large, potentially noninformative set. To enable scalable inference of iMAP to association studies with hundreds of thousands of individuals and millions of SNPs, we develop an efficient expectation maximization algorithm based on an approximate penalized regression algorithm. With simulations and comparisons to existing methods, we illustrate the benefits of iMAP both in terms of high association mapping power and in terms of accurate estimation of genome-wide SNP association patterns. Finally, we apply iMAP to perform a joint analysis of 48 traits from 31 GWAS consortia together with 40 tissue-specific SNP annotations generated from the Roadmap Project. iMAP is freely available at www.xzlab.org/software.html.

Download Full-text

Use of support vector machines for disease risk prediction in genome-wide association studies: Concerns and opportunities

Human Mutation ◽

10.1002/humu.22161 ◽

2012 ◽

Vol 33 (12) ◽

pp. 1708-1718 ◽

Cited By ~ 24

Author(s):

Florian Mittag ◽

Finja Büchel ◽

Mohamad Saad ◽

Andreas Jahn ◽

Claudia Schulte ◽

...

Keyword(s):

Support Vector Machines ◽

Risk Prediction ◽

Disease Risk ◽

Association Studies ◽

Genome Wide Association ◽

Support Vector ◽

Genome Wide Association Studies ◽

Genome Wide ◽

Vector Machines

Download Full-text

Practical Issues in Screening and Variable Selection in Genome-Wide Association Analysis

Cancer Informatics ◽

10.4137/cin.s16350 ◽

2014 ◽

Vol 13s7 ◽

pp. CIN.S16350 ◽

Cited By ~ 2

Author(s):

Sungyeon Hong ◽

Yongkang Kim ◽

Taesung Park

Keyword(s):

Variable Selection ◽

Association Studies ◽

Screening Method ◽

Computational Cost ◽

Penalized Regression ◽

Adaptive Lasso ◽

Genome Wide Association ◽

Snp Analysis ◽

Genome Wide Association Studies ◽

Genome Wide

Variable selection methods play an important role in high-dimensional statistical modeling and analysis. Computational cost and estimation accuracy are the two main concerns for statistical inference from ultrahigh-dimensional data. In particular, genome-wide association studies (GWAS), which focus on identifying single nucleotide polymorphisms (SNPs) associated with a disease of interest, have produced ultrahigh-dimensional data. Numerous methods have been proposed to handle GWAS data. Most statistical methods have adopted a two-stage approach: pre-screening for dimensional reduction and variable selection to identify causal SNPs. The pre-screening step selects SNPs in terms of their P-values or the absolute values of the regression coefficients in single SNP analysis. Penalized regressions, such as the ridge, lasso, adaptive lasso, and elastic-net regressions, are commonly used for the variable selection step. In this paper, we investigate which combination of pre-screening method and penalized regression performs best on a quantitative phenotype using two real GWAS datasets.

Download Full-text

Detecting Epistasis by LASSO-Penalized-Model Search Algorithm in Human Genome-Wide Association Studies

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.989-994.2426 ◽

2014 ◽

Vol 989-994 ◽

pp. 2426-2430

Author(s):

Zhi Hui Zhou ◽

Gui Xia Liu ◽

Ling Tao Su ◽

Liang Han ◽

Lun Yan

Keyword(s):

Human Genome ◽

Search Algorithm ◽

Association Studies ◽

Simulated Data ◽

Penalized Regression ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Model Search ◽

Number Of Factors ◽

Genome Wide

Extensive studies have shown that many complex diseases are influenced by interaction of certain genes, while due to the limitations and drawbacks of adopting logistic regression (LR) to detect epistasis in human Genome-Wide Association Studies (GWAS), we propose a new method named LASSO-penalized-model search algorithm (LPMA) by restricting it to a tuning constant and combining it with a penalization of the L1-norm of the complexity parameter, and it is implemented utilizing the idea of multi-step strategy. LASSO penalized regression particularly shows advantageous properties when the number of factors far exceeds the number of samples. We compare the performance of LPMA with its competitors. Through simulated data experiments, LPMA performs better regarding to the identification of epistasis and prediction accuracy.

Download Full-text

Risk Prediction Using Genome-Wide Association Studies on Type 2 Diabetes

Genomics & Informatics ◽

10.5808/gi.2016.14.4.138 ◽

2016 ◽

Vol 14 (4) ◽

pp. 138 ◽

Cited By ~ 5

Author(s):

Sungkyoung Choi ◽

Sunghwan Bae ◽

Taesung Park

Keyword(s):

Type 2 Diabetes ◽

Risk Prediction ◽

Association Studies ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Genome Wide

Download Full-text

Meta-Analysis for Penalized Regression Methods with Multi-Cohort Genome-Wide Association Studies

Human Heredity ◽

10.1159/000447969 ◽

2016 ◽

Vol 81 (3) ◽

pp. 142-149 ◽

Cited By ~ 1

Author(s):

Chen Lu ◽

George T. O'Connor ◽

Josée Dupuis ◽

Eric D. Kolaczyk

Keyword(s):

Association Studies ◽

Meta Analysis ◽

Penalized Regression ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Regression Methods ◽

Genome Wide

Download Full-text

False Discovery Rate Estimation for Stability Selection: Application to Genome-Wide Association Studies

Statistical Applications in Genetics and Molecular Biology ◽

10.2202/1544-6115.1663 ◽

2011 ◽

Vol 10 (1) ◽

Cited By ~ 4

Author(s):

Ismaïl Ahmed ◽

Anna-Liisa Hartikainen ◽

Marjo-Riitta Järvelin ◽

Sylvia Richardson

Keyword(s):

False Discovery Rate ◽

Decision Rule ◽

Upper Bound ◽

Association Studies ◽

Penalized Regression ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Stability Selection ◽

False Discovery ◽

Genome Wide

Stability Selection, which combines penalized regression with subsampling, is a promising algorithm to perform variable selection in ultra high dimension. This work is motivated by its evaluation in the context of genome-wide association studies (GWAS). One critical aspect for its use lies in the choice of a decision rule that accounts for the massive number of comparisons realised. The current decision rule relies on the control of the Family Wise Error Rate (FWER) by means of an upper bound derived theoretically. Alternatively, we propose to set the detection threshold according to the more liberal false discovery rate (FDR) criterion. The procedure we propose for its estimation relies on permutations. This procedure is evaluated by simulations according to several scenarios mimicking various correlation structures of genetic data and is compared to the original FWER upper bound. The proposed procedure is shown to be less conservative, and able to pick up more true signals than the FWER upper bound. Finally, the proposed methodology is illustrated on a GWAS analysis of a lipid phenotype (high-density lipoproteins, HDL) in the Northern Finland Birth Cohort.

Download Full-text

Abstract P3-10-19: Breast Cancer Risk Prediction Model in Korean Women Using Five Polymorphisms Identified in Genome Wide Association Studies

10.1158/0008-5472.sabcs10-p3-10-19 ◽

2010 ◽

Author(s):

W Han ◽

JH Woo ◽

J-H Yu ◽

SK Ahn ◽

HS Kim ◽

...

Keyword(s):

Breast Cancer ◽

Breast Cancer Risk ◽

Cancer Risk ◽

Prediction Model ◽

Risk Prediction ◽

Association Studies ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Korean Women ◽

Genome Wide

Download Full-text