scholarly journals PRS-on-Spark (PRSoS): a novel, efficient and flexible approach for generating polygenic risk scores

2017 ◽  
Author(s):  
Lawrence M. Chen ◽  
Nelson Yao ◽  
Elika Garg ◽  
Yuecai Zhu ◽  
Thao T. T. Nguyen ◽  
...  

AbstractMotivationPolygenic risk scores describe the genomic contribution to complex phenotypes and consistently account for a larger proportion of the variance than single nucleotide polymorphisms alone. However, there is little consensus on the optimal data input for generating polygenic risk scores and existing approaches largely preclude the use of imputed posterior probabilities and strand-ambiguous SNPs.ResultsWe developed PRS-on-Spark (PRSoS) a polygenic risk score software implemented in Apache Spark and Python that accommodates a variety of data input (e.g., observed genotypes, imputed genotypes, or imputed posterior probabilities) and strand-ambiguous SNPs. We show that PRSoS is flexible and efficient and computes polygenic risk scores at a range of p-value thresholds more quickly than existing software (PRSice). We also show that the use of imputed posterior probabilities and the inclusion of strand-ambiguous SNPs increases the proportion of variance explained by polygenic risk scores for major depression.Availability and ImplementationPRSoS is written in Apache Spark and Python and is freely available (see https://github.com/MeaneyLab/PRSoS).

2021 ◽  
Author(s):  
Nuzulul Kurniansyah ◽  
Matthew O Goodman ◽  
Tanika Kelly ◽  
Tali Elfassi ◽  
Kerri Wiggins ◽  
...  

Background: We used summary statistics from previously-published GWAS of systolic and diastolic BP and of hypertension to construct Polygenic Risk Scores (PRS) to predict hypertension across diverse populations. Methods: We used 10,314 participants of diverse ancestry from BioMe to train trait-specific PRS. We implemented a novel approach to select one of multiple potential PRS based on the same GWAS, by optimizing the coefficient of variation across estimated PRS effect sizes in independent subsets of the training dataset. We combined the 3 selected trait-specific PRS as their unweighted sum, called "PRSsum". We evaluated PRS associations in an independent dataset of 39,035 individuals from eight cohort studies, to select the final, multi-ethnic, HTN-PRS. We estimated its association with prevalent and incident hypertension 4-6 years later. We studied hypertension development within HTN-PRS strata in a longitudinal, six-visit, longitudinal dataset of 3,087 self-identified Black and White participants from the CARDIA study. Finally, we evaluated the HTN-PRS association with clinical outcomes in 40,201 individuals from the MGB Biobank. Results: Compared to other race/ethnic backgrounds, African-Americans had higher average values of the HTN-PRS. The HTN-PRS was associated with prevalent hypertension (OR=2.10, 95% CI [1.99, 2.21], per one standard deviation (SD) of the PRS) across all participants, and in each race/ethnic background, with heterogeneity by background (p-value < 1.0x10-4). The lowest estimated effect size was in African Americans (OR=1.53, 95% CI [1.38, 1.69]). The HTN-PRS was associated with new onset hypertension among individuals with normal (respectively, elevated) BP at baseline: OR=1.71, 95% CI [1.55, 1.91] (OR=1.48, 95% CI [1.27, 1.71]). Association was further observed in age-stratified analysis. In CARDIA, Black participants with high HTN-PRS percentiles developed hypertension earlier than White participants with high HTN-PRS percentiles. The HTN-PRS was significantly associated with increased risk of coronary artery disease (OR=1.12), ischemic stroke (OR=1.15), type 2 diabetes (OR=1.19), and chronic kidney disease (OR=1.12), in the MGB Biobank. Conclusions: The multi-ethnic HTN-PRS is associated with both prevalent and incident hypertension at 4-6 years of follow up across adulthood and is associated with clinical outcomes.


2021 ◽  
Author(s):  
Sam Hodgson ◽  
Qin Qin Huang ◽  
Neneh Sallah ◽  
Chris J Griffiths ◽  
William Newman ◽  
...  

Background: Type 2 diabetes is a heterogeneous condition highly prevalent in British Pakistanis and Bangladeshis (BPB). The Genes & Health (G&H) cohort offers means to explore genetic determinants of disease in BPBs, combining genetic and lifelong health record data. Methods: We assessed whether common genetic loci associated with type 2 diabetes in European-ancestry individuals (EUR) replicate in G&H. We constructed a type 2 diabetes polygenic risk score (PRS) and combined it with a clinical risk instrument (QDiabetes) to build a novel, integrated risk tool (IRT). We compared IRT performance using net reclassification index (NRI) versus QDiabetes alone. We assessed the ability of the PRS to predict type 2 diabetes following gestational diabetes (GDM). We compared PRS distribution between type 2 diabetes subgroups identified by clinical features at diagnosis. Findings: Accounting for power, we replicated fewer loci associated with type 2 diabetes in G&H (n = 76/338, 22%) than would be expected if all EUR-ascertained loci were transferable (n = 95, 28%) (binomial p value = 0.01). In 13,648 patients free from type 2 diabetes followed up for 10 years, NRI was 3.2% for IRT versus QDiabetes (95% confidence interval 2.0 - 4.4%). IRT performance was best in reclassification of young adults deemed low risk by QDiabetes as high risk. PRS was independently associated with progression to type 2 diabetes after GDM (p = 0.028). Mean type 2 diabetes PRS differed between phenotypically-defined type 2 diabetes subgroups (p = 0.002). Interpretation: The type 2 diabetes PRS has broad potential clinical application in BPB, improving identification of type 2 diabetes risk (especially in the young), and characterisation of type 2 diabetes subgroups at diagnosis. Funding: Wellcome Trust, MRC, NIHR, and others. Full funding disclosed within.


Author(s):  
Nguyễn Trần Thế Hùng ◽  
Lê Đức Hậu

Recent technological advancements and availability of genetic databases have facilitated the integration of genetic factors into risk prediction models. A Polygenic Risk Score (PRS) combines the effect of many Single Nucleotide Polymorphisms (SNP) into a single score. This score has lately been shown to have a clinically predictive value in various common diseases. Some clinical interpretations of PRS are summarized in this review for coronary artery disease, breast cancer, prostate cancer, diabetes mellitus, and Alzheimer’s disease. While these findings gave support to the implementation of PRS in clinical settings, the populations of interest were derived mainly from European ancestry. Therefore, applying these findings to non-European ancestry (Vietnamese in this context) requires many efforts and cautions. This review aims to articulate the evidence supporting the clinical use of PRS, the concepts behind the validity of PRS, approach to implement PRS in Vietnamese population, and cautions in selecting methods and thresholds to develop an appropriate PRS.


Author(s):  
George Hindy ◽  
Frans Wiberg ◽  
Peter Almgren ◽  
Olle Melander ◽  
Marju Orho-Melander

Background: Coronary heart disease (CHD) is a multifactorial disease with both genetic and environmental components. Smoking is the most important modifiable risk factor for CHD. Our aim was to test whether the increased CHD incidence by smoking is modified by genetic predisposition to CHD. Methods and Results: Our study included 24 443 individuals from the MDCS (Malmö Diet and Cancer Study). A weighted polygenic risk score (PRS) was created by summing the number of risk alleles for 50 single-nucleotide polymorphisms associated with CHD. Individuals were classified as current, former, or never smokers. Interactions were primarily tested between smoking status and PRS and secondarily with individual single-nucleotide polymorphisms. Then, the predictive use of PRS for CHD incidence was tested among different smoking categories. During a median follow-up time of 19.4 years, 3217 incident CHD cases were recorded. The association between smoking and CHD was modified by the PRS ( P interaction =0.005). The magnitude of increased incidence of CHD by smoking was highest among individuals in the lowest tertile of PRS (odds ratio, 1.42; 95% confidence interval, 1.29–1.56 per smoking risk category) compared with the highest tertile (odds ratio, 1.20; 95% confidence interval, 1.11–1.30 per smoking risk category). This interaction was stronger among men ( P interaction =0.001) compared with women ( P interaction =0.44). The PRS provided a significantly better net reclassification and discrimination on top of traditional risk factors among never smokers compared with current smokers ( P <0.001). Conclusions: Genetic predisposition to CHD modifies the associated increased CHD risk by smoking. The PRS has a better predictive use among never smokers compared with smokers.


2020 ◽  
Vol 117 (11) ◽  
pp. 5997-6002 ◽  
Author(s):  
Sandya Liyanarachchi ◽  
Julius Gudmundsson ◽  
Egil Ferkingstad ◽  
Huiling He ◽  
Jon G. Jonasson ◽  
...  

Genome-wide association studies (GWASs) have identified at least 10 single-nucleotide polymorphisms (SNPs) associated with papillary thyroid cancer (PTC) risk. Most of these SNPs are common variants with small to moderate effect sizes. Here we assessed the combined genetic effects of these variants on PTC risk by using summarized GWAS results to build polygenic risk score (PRS) models in three PTC study groups from Ohio (1,544 patients and 1,593 controls), Iceland (723 patients and 129,556 controls), and the United Kingdom (534 patients and 407,945 controls). A PRS based on the 10 established PTC SNPs showed a stronger predictive power compared with the clinical factors model, with a minimum increase of area under the receiver-operating curve of 5.4 percentage points (P≤ 1.0 × 10−9). Adding an extended PRS based on 592,475 common variants did not significantly improve the prediction power compared with the 10-SNP model, suggesting that most of the remaining undiscovered genetic risk in thyroid cancer is due to rare, moderate- to high-penetrance variants rather than to common low-penetrance variants. Based on the 10-SNP PRS, individuals in the top decile group of PRSs have a close to sevenfold greater risk (95% CI, 5.4–8.8) compared with the bottom decile group. In conclusion, PRSs based on a small number of common germline variants emphasize the importance of heritable low-penetrance markers in PTC.


BMC Medicine ◽  
2021 ◽  
Vol 19 (1) ◽  
Author(s):  
Mireia Obón-Santacana ◽  
Anna Díez-Villanueva ◽  
Maria Henar Alonso ◽  
Gemma Ibáñez-Sanz ◽  
Elisabet Guinó ◽  
...  

Abstract Background Different risk-based colorectal cancer (CRC) screening strategies, such as the use of polygenic risk scores (PRS), have been evaluated to improve effectiveness of these programs. However, few studies have previously assessed its usefulness in a fecal immunochemical test (FIT)-based screening study. Methods A PRS of 133 single nucleotide polymorphisms was assessed for 3619 participants: population controls, screening controls, low-risk lesions (LRL), intermediate-risk (IRL), high-risk (HRL), CRC screening program cases, and clinically diagnosed CRC cases. The PRS was compared between the subset of cases (n = 648; IRL+HRL+CRC) and controls (n = 956; controls+LRL) recruited within a FIT-based screening program. Positive predictive values (PPV), negative predictive values (NPV), and the area under the receiver operating characteristic curve (aROC) were estimated using cross-validation. Results The overall PRS range was 110–156. PRS values increased along the CRC tumorigenesis pathway (Mann-Kendall P value 0.007). Within the screening subset, the PRS ranged 110-151 and was associated with higher risk-lesions and CRC risk (ORD10vsD1 1.92, 95% CI 1.22–3.03). The cross-validated aROC of the PRS for cases and controls was 0.56 (95% CI 0.53–0.59). Discrimination was equal when restricted to positive FIT (aROC 0.56), but lower among negative FIT (aROC 0.55). The overall PPV among positive FIT was 0.48. PPV were dependent on the number of risk alleles for positive FIT (PPVp10-p90 0.48–0.57). Conclusions PRS plays an important role along the CRC tumorigenesis pathway; however, in practice, its utility to stratify the general population or as a second test after a FIT positive result is still doubtful. Currently, PRS is not able to safely stratify the general population since the improvement on PPV values is scarce.


2021 ◽  
Author(s):  
Can Hou ◽  
Daowen Yang ◽  
Yu Hao ◽  
Bin Xu ◽  
Huan Song ◽  
...  

Abstract Background Studies investigating breast cancer polygenic risk score (PRS) in Chinese women are scarce. The objectives of this study were to develop and validate PRSs that could be used to stratify risk for overall and subtype-specific breast cancer in Chinese women, and to evaluate the performance of a newly proposed Artificial Neural Network (ANN) based approach for PRS construction. Methods The PRSs were constructed using the a GWAS dataset and validated in an independent case-control study. Three approaches, including repeated logistic regression (RLR), logistic ridge regression (LRR) and ANN based approach, were used to build the PRSs for overall and subtype-specific breast cancer based on 24 selected single nucleotide polymorphisms (SNPs). Predictive performance and calibration of the PRSs were evaluated unadjusted and adjusted for Gail-2 model 5-year risk or classical breast cancer risk factors. Results The primary PRSANN and PRSLRR both showed good predictive ability for overall breast cancer (IQ-OR 1.76 vs 1.58; AUC 0.601 vs 0.598) and remained to be predictive after adjustment. Although estrogen receptor negative (ER-) breast cancer was poorly predicted by the primary PRSs, the ER- PRSs trained solely on ER- breast cancer cases saw a substantial improvement in predictions of ER- breast cancer. Conclusions The SNP-24 based PRSs can provide additional risk information to help breast cancer risk stratification in the general population of China. The newly proposed ANN approach for PRS construction has potential to replace the traditional approaches, but more studies are needed to validate and investigate its performance.


2020 ◽  
Vol 3 (Supplement_1) ◽  
pp. 95-96
Author(s):  
S Lee ◽  
K Shestopaloff ◽  
O Espin-Garcia ◽  
W Turpin ◽  
J Raygoza Garay ◽  
...  

Abstract Background Fecal calprotectin concentration (FC), a measure of gut inflammation is reported to be significantly higher in healthy first-degree relatives (FDR) of Crohn’s disease (CD) patients compared to healthy controls. In contrast, FC in spouses of CD patients was not significantly different from controls, suggesting that a genetic predisposition rather than a shared environmental factor affects FC. Aims We investigated the genetic association with FC in healthy FDRs of CD patients. Notably, these subjects are known to be enriched with CD risk alleles. Methods We investigated 1455 healthy Caucasian FDRs of CD patients from the GEM Project. Subjects were genotyped by HumanCoreEXOME chip and ImmunoChip platforms and then imputed by the Haplotype Reference Consortium v1.1 panel (Michigan Imputation Server). SNPs with a minor allele frequency&lt;5% were removed. FC was measured using BUHLMANN ELISA kit. Heritability was estimated using a pedigree based SOLAR program and a SNP-based GCTA software. Genome wide association of FC was tested using the GEE framework that accounts for family clusters, age, sex, first 3 genetic principal components and multiplex family status (≥2 FDRs diagnosed with CD). In addition, CD-polygenic risk scores were derived based on summary statistics and imputed SNPs from a recent GWAS by pruning and thresholding (P+T) and LDPred algorithm (PMID:31002795). Results Among 1455 subjects, 45.2% were male, median age was 19 years (IQR 13–26), 8.8% were from multiplex families, and median FC was 52 mg/kg (IQR 31–87; 20.8% had FC&gt;100). We estimated the heritability of FC to be 27% (27.1%, standard error=9%, p&lt;0.001 by pedigree approach; 27.9%, SE=12%, p&lt;0.001 by SNP approach). An untargeted GWAS failed to show any significant association with FC (i.e. p&lt;5x10-8). The lowest p value was obtained for rs224631 (p=5x10-7). Strikingly, an increase in CD polygenic risk scores was significantly associated with an increase of FC (p=5.2x10-5 with P+T method). Conclusions We demonstrate that FC concentration is a heritable trait in unaffected FDRs of CD patients. Although the association between genetic variants with FC did not reach GWAS significance, CD-polygenic risk score, which incorporates small effect size CD-associated SNPs, was significantly associated with FC concentrationin this cohort. Our results suggest that FC concentration is influenced genetically with contributions from CD-associated SNPs in unaffected FDRs of CD probands. It remains to be determined if the genetic influence to FC concentration is dependent/independent with the future development of CD. Submitted on behalf of The CCC-GEM Project research team Funding Agencies CCCHelmsley Charitable Trust/ Mount Sinai Hospital Fellowship Award


Author(s):  
Léna G Dietrich ◽  
Christian W Thorball ◽  
Lene Ryom ◽  
Felix Burkhalter ◽  
Barbara Hasse ◽  
...  

Abstract Background In people with human immunodeficiency virus (PWH), it is unknown whether genetic background associates with rapid progression of kidney dysfunction (ie, estimated glomerular filtration rate [eGFR] decrease of &gt;5mL/min/1.73m2 per year for ≥3 consecutive years). Methods We obtained univariable and multivariable hazard ratios (HR) for rapid progression, based on the clinical D:A:D chronic kidney disease (CKD) risk score, antiretroviral exposures, and a polygenic risk score based on 14 769 genome-wide single nucleotide polymorphisms in white Swiss HIV Cohort Study participants. Results We included 225 participants with rapid progression and 3378 rapid progression-free participants. In multivariable analysis, compared to participants with low D:A:D risk, participants with high risk had rapid progression (HR =  1.82 [95% CI, 1.28–2.60]). Compared to the first (favorable) polygenic risk score quartile, participants in the second, third, and fourth (unfavorable) quartiles had rapid progression (HR = 1.39 [95% CI, 0.94–2.06], 1.52 [95% CI, 1.04–2.24], and 2.04 [95% CI, 1.41–2.94], respectively). Recent exposure to tenofovir disoproxil fumarate was associated with rapid progression (HR = 1.36 [95% CI, 1.06–1.76]). Discussion An individual polygenic risk score is associated with rapid progression in Swiss PWH, when analyzed in the context of clinical and antiretroviral risk factors.


Sign in / Sign up

Export Citation Format

Share Document