scholarly journals Efficient implementation of penalized regression for genetic risk prediction

2018 ◽  
Author(s):  
Florian Privé ◽  
Hugues Aschard ◽  
Michael G.B. Blum

AbstractPolygenic Risk Scores (PRS) consist in combining the information across many single-nucleotide polymorphisms (SNPs) in a score reflecting the genetic risk of developing a disease. PRS might have a major impact on public health, possibly allowing for screening campaigns to identify high-genetic risk individuals for a given disease. The “Clumping+Thresholding” (C+T) approach is the most common method to derive PRS. C+T uses only univariate genome-wide association studies (GWAS) summary statistics, which makes it fast and easy to use. However, previous work showed that jointly estimating SNP effects for computing PRS has the potential to significantly improve the predictive performance of PRS as compared to C+T.In this paper, we present an efficient method to jointly estimate SNP effects, allowing for practical application of penalized logistic regression (PLR) on modern datasets including hundreds of thousands of individuals. Moreover, our implementation of PLR directly includes automatic choices for hyper-parameters. The choice of hyper-parameters for a predictive model is very important since it can dramatically impact its predictive performance. As an example, AUC values range from less than 60% to 90% in a model with 30 causal SNPs, depending on the p-value threshold in C+T.We compare the performance of PLR, C+T and a derivation of random forests using both real and simulated data. PLR consistently achieves higher predictive performance than the two other methods while being as fast as C+T. We find that improvement in predictive performance is more pronounced when there are few effects located in nearby genomic regions with correlated SNPs; for instance, AUC values increase from 83% with the best prediction of C+T to 92.5% with PLR. We confirm these results in a data analysis of a case-control study for celiac disease where PLR and the standard C+T method achieve AUC of 89% and of 82.5%.In conclusion, our study demonstrates that penalized logistic regression can achieve more discriminative polygenic risk scores, while being applicable to large-scale individual-level data thanks to the implementation we provide in the R package bigstatsr.

Author(s):  
Lars G. Fritsche ◽  
Snehal Patil ◽  
Lauren J. Beesley ◽  
Peter VandeHaar ◽  
Maxwell Salvatore ◽  
...  

AbstractTo facilitate scientific collaboration on polygenic risk scores (PRS) research, we created an extensive PRS online repository for 49 common cancer traits integrating freely available genome-wide association studies (GWAS) summary statistics from three sources: published GWAS, the NHGRI-EBI GWAS Catalog, and UK Biobank-based GWAS. Our framework condenses these summary statistics into PRS using various approaches such as linkage disequilibrium pruning / p-value thresholding (fixed or data-adaptively optimized thresholds) and penalized, genome-wide effect size weighting. We evaluated the PRS in two biobanks: the Michigan Genomics Initiative (MGI), a longitudinal biorepository effort at Michigan Medicine, and the population-based UK Biobank (UKB). For each PRS construct, we provide measures on predictive performance, calibration, and discrimination. Besides PRS evaluation, the Cancer-PRSweb platform features construct downloads and phenome-wide PRS association study results (PRS-PheWAS) for predictive PRS. We expect this integrated platform to accelerate PRS-related cancer research.


2019 ◽  
Author(s):  
Zijie Zhao ◽  
Yanyao Yi ◽  
Yuchang Wu ◽  
Xiaoyuan Zhong ◽  
Yupei Lin ◽  
...  

AbstractPolygenic risk scores (PRSs) have wide applications in human genetics research. Notably, most PRS models include tuning parameters which improve predictive performance when properly selected. However, existing model-tuning methods require individual-level genetic data as the training dataset or as a validation dataset independent from both training and testing samples. These data rarely exist in practice, creating a significant gap between PRS methodology and applications. Here, we introduce PUMAS (Parameter-tuning Using Marginal Association Statistics), a novel method to fine-tune PRS models using summary statistics from genome-wide association studies (GWASs). Through extensive simulations, external validations, and analysis of 65 traits, we demonstrate that PUMAS can perform a variety of model-tuning procedures (e.g. cross-validation) using GWAS summary statistics and can effectively benchmark and optimize PRS models under diverse genetic architecture. On average, PUMAS improves the predictive R2 by 205.6% and 62.5% compared to PRSs with arbitrary p-value cutoffs of 0.01 and 1, respectively. Applied to 211 neuroimaging traits and Alzheimer’s disease, we show that fine-tuned PRSs will significantly improve statistical power in downstream association analysis. We believe our method resolves a fundamental problem without a current solution and will greatly benefit genetic prediction applications.


Neurology ◽  
2018 ◽  
Vol 90 (18) ◽  
pp. e1605-e1612 ◽  
Author(s):  
Tian Ge ◽  
Mert R. Sabuncu ◽  
Jordan W. Smoller ◽  
Reisa A. Sperling ◽  
Elizabeth C. Mormino ◽  
...  

ObjectiveTo investigate the effects of genetic risk of Alzheimer disease (AD) dementia in the context of β-amyloid (Aβ) accumulation.MethodsWe analyzed data from 702 participants (221 clinically normal, 367 with mild cognitive impairment, and 114 with AD dementia) with genetic data and florbetapir PET available. A subset of 669 participants additionally had longitudinal MRI scans to assess hippocampal volume. Polygenic risk scores (PRSs) were estimated with summary statistics from previous large-scale genome-wide association studies of AD dementia. We examined relationships between APOE ε4 status and PRS with longitudinal Aβ and cognitive and hippocampal volume measurements.ResultsAPOE ε4 was strongly related to baseline Aβ, whereas only weak associations between PRS and baseline Aβ were present. APOE ε4 was additionally related to greater memory decline and hippocampal atrophy in Aβ+ participants. When APOE ε4 was controlled for, PRS was related to cognitive decline in Aβ+ participants. Finally, PRSs were associated with hippocampal atrophy in Aβ− participants and weakly associated with baseline hippocampal volume in Aβ+ participants.ConclusionsGenetic risk factors of AD dementia demonstrate effects related to Aβ, as well as synergistic interactions with Aβ. The specific effect of faster cognitive decline in Aβ+ individuals with higher genetic risk may explain the large degree of heterogeneity in cognitive trajectories among Aβ+ individuals. Consideration of genetic variants in conjunction with baseline Aβ may improve enrichment strategies for clinical trials targeting Aβ+ individuals most at risk for imminent cognitive decline.


Thorax ◽  
2021 ◽  
pp. thoraxjnl-2020-215624
Author(s):  
Sinjini Sikdar ◽  
Annah B Wyss ◽  
Mi Kyeong Lee ◽  
Thanh T Hoang ◽  
Marie Richards ◽  
...  

RationaleGenome-wide association studies (GWASs) have identified numerous loci associated with lower pulmonary function. Pulmonary function is strongly related to smoking and has also been associated with asthma and dust endotoxin. At the individual SNP level, genome-wide analyses of pulmonary function have not identified appreciable evidence for gene by environment interactions. Genetic Risk Scores (GRSs) may enhance power to identify gene–environment interactions, but studies are few.MethodsWe analysed 2844 individuals of European ancestry with 1000 Genomes imputed GWAS data from a case–control study of adult asthma nested within a US agricultural cohort. Pulmonary function traits were FEV1, FVC and FEV1/FVC. Using data from a recent large meta-analysis of GWAS, we constructed a weighted GRS for each trait by combining the top (p value<5×10−9) genetic variants, after clumping based on distance (±250 kb) and linkage disequilibrium (r2=0.5). We used linear regression, adjusting for relevant covariates, to estimate associations of each trait with its GRS and to assess interactions.ResultsEach trait was highly significantly associated with its GRS (all three p values<8.9×10−8). The inverse association of the GRS with FEV1/FVC was stronger for current smokers (pinteraction=0.017) or former smokers (pinteraction=0.064) when compared with never smokers and among asthmatics compared with non-asthmatics (pinteraction=0.053). No significant interactions were observed between any GRS and house dust endotoxin.ConclusionsEvaluation of interactions using GRSs supports a greater impact of increased genetic susceptibility on reduced pulmonary function in the presence of smoking or asthma.


2019 ◽  
Author(s):  
R.L. Kember ◽  
A. Verma ◽  
S. Verma ◽  
A. Lucas ◽  
R. Judy ◽  
...  

AbstractCardio-renal-metabolic (CaReMe) conditions are common and the leading cause of mortality around the world. Genome-wide association studies have shown that these diseases are polygenic and share many genetic risk factors. Identifying individuals at high genetic risk will allow us to target prevention and treatment strategies. Polygenic risk scores (PRS) are aggregate weighted counts that can demonstrate an individual’s genetic liability for disease. However, current PRS are often based on European ancestry individuals, limiting the implementation of precision medicine efforts in diverse populations. In this study, we develop PRS for six diseases and traits related to cardio-renal-metabolic disease in the Penn Medicine Biobank. We investigate their performance in both European and African ancestry individuals, and identify genetic and phenotypic overlap within these conditions. We find that genetic risk is associated with the primary phenotype in both ancestries, but this does not translate into a model of predictive value in African ancestry individuals. We conclude that future research should prioritize genetic studies in diverse ancestries in order to address this disparity.


2021 ◽  
Author(s):  
Sophia Gunn ◽  
Michael Wainberg ◽  
Zeyuan Song ◽  
Stacy Andersen ◽  
Robert Boudreau ◽  
...  

Background: A surprising and well-replicated result in genetic studies of human longevity is that centenarians appear to carry disease-associated variants in numbers similar to the general population. With the proliferation of large genome-wide association studies (GWAS) in recent years, investigators have turned to polygenic scores to leverage GWAS results into a measure of genetic risk that can better predict risk of disease than individual significant variants alone. Methods: We selected 54 polygenic risk scores (PRSs) developed for a variety of outcomes and we calculated their values in individuals from the New England Centenarian Study (NECS, N = 4886) and the Long Life Family Study (LLFS, N = 4577). We compared the distribution of these PRSs among exceptionally long-lived individuals (ELLI), their offspring and controls and we also examined their predictive values, using t-tests and regression models adjusting for sex and principal components reflecting ancestral background of the individuals (PCs). In our analyses we controlled for multiple testing using a Bonferroni-adjusted threshold for 54 traits. Results: We found that only 4 of the 54 PRSs differed between ELLIs and controls in both cohorts. ELLIs had significantly lower mean PRSs for Alzheimer's disease (AD), coronary artery disease (CAD) and systemic lupus than controls, suggesting genetic predisposition to extreme longevity may be mediated by reduced susceptibility to these traits. ELLIs also had significantly higher mean PRSs for improved cognitive function. In addition, the PRS for AD was associated with higher risk of dementia among controls but not ELLIs (p = 0.0004, 0.3 in NECS, p = 0.03, 0.93 in LLFS respectively). Interestingly, ELLIs did not have a larger number of homozygous risk genotypes for AD (TNECS = -1.72, TLLFs = 0.83) and CAD (TNECS = -5.08, TLLFs = -0.31) in both cohorts, but did have significantly larger number of homozygous protective genotypes than controls for the two traits (AD: TNECS =3.10, TLLFs = 2.2, CAD: TNECS = 6.57, TLLFs =2.36, respectively). Conclusions: ELLIs have a similar burden of genetic disease risk as the general population for most traits, but have significantly lower genetic risk of AD, CAD, and lupus. The lack of association between AD PRS and dementia among ELLIs suggests that their genetic risk for AD is somehow buffered by protective genetic or environmental factors.


2021 ◽  
Author(s):  
Giuseppe Fanelli ◽  
Katharina Domschke ◽  
Alessandra Minelli ◽  
Massimo Gennarelli ◽  
Paolo Martini ◽  
...  

About two-thirds of patients with major depressive disorder (MDD) fail to achieve symptom remission after the initial antidepressant treatment. Despite a role of genetic factors was proven, the specific underpinnings are not fully understood yet. Polygenic risk scores (PRSs), which summarise the additive effect of multiple risk variants across the genome, might provide insights into the underlying genetics. This study aims to investigate the possible association of PRSs for bipolar disorder, MDD, neuroticism, and schizophrenia (SCZ) with antidepressant non-response or non-remission in patients with MDD. PRSs were calculated at eight genome-wide P-thresholds based on publicly available summary statistics of the largest genome-wide association studies. Logistic regressions were performed between PRSs and non-response or non-remission in six European clinical samples, adjusting for age, sex, baseline symptom severity, recruitment sites, and population stratification. Results were meta-analysed across samples, including up to 3,637 individuals. Bonferroni correction was applied. In the meta-analysis, no result was significant after Bonferroni correction. The top result was found for MDD-PRS and non-remission (p=0.004), with patients in the highest vs. lowest PRS quintile being more likely not to achieve remission (OR=1.5, 95% CI=1.11-1.98, p=0.007). Nominal associations were also found between MDD-PRS and non-response (p=0.013), as well as between SCZ-PRS and non-remission (p=0.035). Although PRSs are still not able to predict non-response or non-remission, our results are in line with previous works; methodological improvements in PRSs calculation may improve their predictive performance and have a meaningful role in precision psychiatry.


Author(s):  
Niccolo’ Tesi ◽  
Sven J van der Lee ◽  
Marc Hulsman ◽  
Iris E Jansen ◽  
Najada Stringa ◽  
...  

Abstract Studying the genome of centenarians may give insights into the molecular mechanisms underlying extreme human longevity and the escape of age-related diseases. Here, we set out to construct polygenic risk scores (PRSs) for longevity and to investigate the functions of longevity-associated variants. Using a cohort of centenarians with maintained cognitive health (N = 343), a population-matched cohort of older adults from 5 cohorts (N = 2905), and summary statistics data from genome-wide association studies on parental longevity, we constructed a PRS including 330 variants that significantly discriminated between centenarians and older adults. This PRS was also associated with longer survival in an independent sample of younger individuals (p = .02), leading up to a 4-year difference in survival based on common genetic factors only. We show that this PRS was, in part, able to compensate for the deleterious effect of the APOE-ε4 allele. Using an integrative framework, we annotated the 330 variants included in this PRS by the genes they associate with. We find that they are enriched with genes associated with cellular differentiation, developmental processes, and cellular response to stress. Together, our results indicate that an extended human life span is, in part, the result of a constellation of variants each exerting small advantageous effects on aging-related biological mechanisms that maintain overall health and decrease the risk of age-related diseases.


2018 ◽  
Author(s):  
Tom G. Richardson ◽  
Sean Harrison ◽  
Gibran Hemani ◽  
George Davey Smith

AbstractThe age of large-scale genome-wide association studies (GWAS) has provided us with an unprecedented opportunity to evaluate the genetic liability of complex disease using polygenic risk scores (PRS). In this study, we have analysed 162 PRS (P<5×l0 05) derived from GWAS and 551 heritable traits from the UK Biobank study (N=334,398). Findings can be investigated using a web application (http://mrcieu.mrsoftware.org/PRS_atlas/), which we envisage will help uncover both known and novel mechanisms which contribute towards disease susceptibility.To demonstrate this, we have investigated the results from a phenome-wide evaluation of schizophrenia genetic liability. Amongst findings were inverse associations with measures of cognitive function which extensive follow-up analyses using Mendelian randomization (MR) provided evidence of a causal relationship. We have also investigated the effect of multiple risk factors on disease using mediation and multivariable MR frameworks. Our atlas provides a resource for future endeavours seeking to unravel the causal determinants of complex disease.


2018 ◽  
Author(s):  
Roman Teo Oliynyk

AbstractBackgroundGenome-wide association studies and other computational biology techniques are gradually discovering the causal gene variants that contribute to late-onset human diseases. After more than a decade of genome-wide association study efforts, these can account for only a fraction of the heritability implied by familial studies, the so-called “missing heritability” problem.MethodsComputer simulations of polygenic late-onset diseases in an aging population have quantified the risk allele frequency decrease at older ages caused by individuals with higher polygenic risk scores becoming ill proportionately earlier. This effect is most prominent for diseases characterized by high cumulative incidence and high heritability, examples of which include Alzheimer’s disease, coronary artery disease, cerebral stroke, and type 2 diabetes.ResultsThe incidence rate for late-onset diseases grows exponentially for decades after early onset ages, guaranteeing that the cohorts used for genome-wide association studies overrepresent older individuals with lower polygenic risk scores, whose disease cases are disproportionately due to environmental causes such as old age itself. This mechanism explains the decline in clinical predictive power with age and the lower discovery power of familial studies of heritability and genome-wide association studies. It also explains the relatively constant-with-age heritability found for late-onset diseases of lower prevalence, exemplified by cancers.ConclusionsFor late-onset polygenic diseases showing high cumulative incidence together with high initial heritability, rather than using relatively old age-matched cohorts, study cohorts combining the youngest possible cases with the oldest possible controls may significantly improve the discovery power of genome-wide association studies.


Sign in / Sign up

Export Citation Format

Share Document