scholarly journals Genomic Prediction of Complex Disease Risk

2018 ◽  
Author(s):  
Louis Lello ◽  
Timothy G. Raben ◽  
Soke Yuen Yong ◽  
Laurent CAM Tellier ◽  
Stephen D.H. Hsu

AbstractWe construct risk predictors using polygenic scores (PGS) computed from common Single Nucleotide Polymorphisms (SNPs) for a number of complex disease conditions, using L1-penalized regression (also known as LASSO) on case-control data from UK Biobank. Among the disease conditions studied are Hypothyroidism, (Resistant) Hypertension, Type 1 and 2 Diabetes, Breast Cancer, Prostate Cancer, Testicular Cancer, Gallstones, Glaucoma, Gout, Atrial Fibrillation, High Cholesterol, Asthma, Basal Cell Carcinoma, Malignant Melanoma, and Heart Attack. We obtain values for the area under the receiver operating characteristic curves (AUC) in the range ~ 0.58 – 0.71 using SNP data alone. Substantially higher predictor AUCs are obtained when incorporating additional variables such as age and sex. Some SNP predictors alone are sufficient to identify outliers (e.g., in the 99th percentile of PGS) with 3 – 8 times higher risk than typical individuals. We validate predictors out-of-sample using the eMERGE dataset, and also with different ancestry subgroups within the UK Biobank population. Our results indicate that substantial improvements in predictive power are attainable using training sets with larger case populations. We anticipate rapid improvement in genomic prediction as more case-control data become available for analysis.

2019 ◽  
Vol 9 (1) ◽  
Author(s):  
Louis Lello ◽  
Timothy G. Raben ◽  
Soke Yuen Yong ◽  
Laurent C. A. M. Tellier ◽  
Stephen D. H. Hsu

Abstract We construct risk predictors using polygenic scores (PGS) computed from common Single Nucleotide Polymorphisms (SNPs) for a number of complex disease conditions, using L1-penalized regression (also known as LASSO) on case-control data from UK Biobank. Among the disease conditions studied are Hypothyroidism, (Resistant) Hypertension, Type 1 and 2 Diabetes, Breast Cancer, Prostate Cancer, Testicular Cancer, Gallstones, Glaucoma, Gout, Atrial Fibrillation, High Cholesterol, Asthma, Basal Cell Carcinoma, Malignant Melanoma, and Heart Attack. We obtain values for the area under the receiver operating characteristic curves (AUC) in the range ~0.58–0.71 using SNP data alone. Substantially higher predictor AUCs are obtained when incorporating additional variables such as age and sex. Some SNP predictors alone are sufficient to identify outliers (e.g., in the 99th percentile of polygenic score, or PGS) with 3–8 times higher risk than typical individuals. We validate predictors out-of-sample using the eMERGE dataset, and also with different ancestry subgroups within the UK Biobank population. Our results indicate that substantial improvements in predictive power are attainable using training sets with larger case populations. We anticipate rapid improvement in genomic prediction as more case-control data become available for analysis.


2013 ◽  
Vol 5 (2) ◽  
pp. 250-260 ◽  
Author(s):  
Suzan Gazioglu ◽  
Jiawei Wei ◽  
Elizabeth M. Jennings ◽  
Raymond J. Carroll

2009 ◽  
Vol 68 (2) ◽  
pp. 113-121
Author(s):  
John Scott

Variations in human DNA, most frequently single-nucleotide polymorphisms (SNPs), can have functional consequences ranging from severe to none. Variations in outcome (phenotype) can be compared, from cystic fibrosis through haemochromatosis to general familial risks in, for example, colo-rectal cancer (CRC). Cystic fibrosis and haemochromatosis have severe phenotypes with high penetrance, with signs and symptoms always or mostly present; thus, they have been easy to identify from family studies. However, the familial risks that are known to contribute markedly to CRC are unknown. The sequencing of the human genome has now made possible the identification of these and other disease variants. Knowing the DNA sequence in an idealised individual adds little unless variants that increase (or decrease) disease risk from the norm can be identified. Such variants can be expected to be very common in the general population, but have low penetrance and only change risk to a limited extent. Many patients will not have the risk variant and many ‘normal’ patients will have the risk variant. Thus, very large case–control cohorts are essential. These case–control cohorts can be analysed at three different levels: (1) individual SNPs; (2) individual genes; (3) genome-wide analysis (GWA). Level 1 looks for case–control differences for specific SNPs. Alternatively, new technology can be applied to examine a range of SNPs within a gene to track differences in its regulation as well as in function. Finally, the whole genome with ≥0·5×106 SNPs could be marked. The first two approaches involve selecting ‘candidate’ SNPs or genes, while GWA looks for any variation in the genome that is enriched in the cases. All three approaches carry the certainty that significant associations will be found by statistical chance, for which correction must be made. This latter issue is helped by large numbers and by independent replication cohorts.


2014 ◽  
Vol 4 (6) ◽  
pp. 289-294
Author(s):  
Prabina Kumar Meher ◽  
Atmakuri Ramakrishna Rao ◽  
Sant Dass Wahi ◽  
B.K. Thelma

2021 ◽  
Author(s):  
William Zhu ◽  
Xiaoping Huang ◽  
Esther Yoon ◽  
Sara P Bandres Ciga ◽  
Cornelis Blauwendraat ◽  
...  

PRKN mutations are the most common recessive cause of Parkinson′s disease (PD) and are a promising target for gene and cell replacement therapies. Identification of biallelic PRKN patients (PRKN-PD) at the population scale, however, remains a challenge, as roughly half are copy number variants (CNVs) and many single nucleotide polymorphisms (SNPs) are of unclear significance. Additionally, the true prevalence and disease risk associated with heterozygous PRKN mutations is unclear, as a comprehensive assessment of PRKN SNPs and CNVs has not been performed at a population scale. To address these challenges, we evaluated PRKN mutations in 2 cohorts analyzed with both a genotyping array and exome or genome sequencing: the NIH PD cohort, a deeply phenotyped cohort of PD patients, and the UK Biobank, a population scale cohort with nearly half a million participants. Genotyping array identified the majority of PRKN mutations and at least 1 mutation in most biallelic PRKN mutation carriers in both cohorts. Additionally, in the NIH PD cohort, functional assays of patient fibroblasts resolved variants of unclear significance in biallelic carriers and ruled out cryptic loss of function variants in monoallelic carriers. In the UK Biobank, we identified 2,692 PRKN CNVs from genotyping array data from nearly half a million participants (the largest collection to date). Deletions or duplications involving exons 2 accounted for roughly half of all CNVs and the vast majority (88%) involved exons 2, 3, or 4. Combining estimates from whole exome sequencing (from ~200,000 participants) and genotyping array data, we found a pathogenic PRKN mutation in 1.8% of participants and 2 mutations in ~1/7,800 participants. Those with 1 PRKN pathogenic variant were as likely as non-carriers to have PD (OR = 0.91, CI= 0.58 – 1.38, p-value = 0.76) or a parent with PD (OR = 1.12, CI = 0.94 – 1.31, p-value = 0.19). Together our results demonstrate that heterozygous pathogenic PRKN mutations are common in the population but do not increase the risk of PD. Additionally, they suggest a cost-effective framework to screen for biallelic PRKN patients at the population scale for targeted studies.


Biometrics ◽  
2014 ◽  
Vol 71 (1) ◽  
pp. 114-121 ◽  
Author(s):  
Xiaohui Chang ◽  
Rasmus Waagepetersen ◽  
Herbert Yu ◽  
Xiaomei Ma ◽  
Theodore R. Holford ◽  
...  

2020 ◽  
Author(s):  
John E. McGeary ◽  
Chelsie Benca-Bachman ◽  
Victoria Risner ◽  
Christopher G Beevers ◽  
Brandon Gibb ◽  
...  

Twin studies indicate that 30-40% of the disease liability for depression can be attributed to genetic differences. Here, we assess the explanatory ability of polygenic scores (PGS) based on broad- (PGSBD) and clinical- (PGSMDD) depression summary statistics from the UK Biobank using independent cohorts of adults (N=210; 100% European Ancestry) and children (N=728; 70% European Ancestry) who have been extensively phenotyped for depression and related neurocognitive phenotypes. PGS associations with depression severity and diagnosis were generally modest, and larger in adults than children. Polygenic prediction of depression-related phenotypes was mixed and varied by PGS. Higher PGSBD, in adults, was associated with a higher likelihood of having suicidal ideation, increased brooding and anhedonia, and lower levels of cognitive reappraisal; PGSMDD was positively associated with brooding and negatively related to cognitive reappraisal. Overall, PGS based on both broad and clinical depression phenotypes have modest utility in adult and child samples of depression.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Yu Zhang ◽  
Li Hua ◽  
Quan-Hua Liu ◽  
Shu-Yuan Chu ◽  
Yue-Xin Gan ◽  
...  

Abstract Background A number of studies have examined the association between mold exposure and childhood asthma. However, the conclusions were inconsistent, which might be partly attributable to the lack of consideration of gene function, especially the key genes affecting the pathogenesis of childhood asthma. Research on the interactions between genes and mold exposure on childhood asthma is still very limited. We therefore examined whether there is an interaction between inflammation-related genes and mold exposure on childhood asthma. Methods A case–control study with 645 asthmatic children and 910 non-asthmatic children aged 3–12 years old was conducted. Eight single nucleotide polymorphisms (SNPs) in inflammation-related genes were genotyped using MassARRAY assay. Mold exposure was defined as self-reported visible mold on the walls. Associations between visible mold exposure, SNPs and childhood asthma were evaluated using logistic regression models. In addition, crossover analyses were used to estimate the gene-environment interactions on childhood asthma on an additive scale. Results After excluding children without information on visible mold exposure or SNPs, 608 asthmatic and 839 non-asthmatic children were included in the analyses. Visible mold exposure was reported in 151 asthmatic (24.8%) and 119 non-asthmatic children (14.2%) (aOR 2.19, 95% CI 1.62–2.97). The rs7216389 SNP in gasdermin B gene (GSDMB) increased the risk of childhood asthma with each C to T substitution in a dose-dependent pattern (additive model, aOR 1.32, 95% CI 1.11–1.57). Children carrying the rs7216389 T allele and exposed to visible mold dramatically increased the risk of childhood asthma (aOR 3.21; 95% CI 1.77–5.99). The attributable proportion due to the interaction (AP: 0.47, 95% CI 0.03–0.90) and the relative excess risk due to the interaction (RERI: 1.49, 95% CI 0–2.99) were statistically significant. Conclusions In the present study, there was a significant additive interaction between visible mold exposure and rs7216389 SNP on childhood asthma. Future studies need to consider the gene-environment interactions when exploring the risk factors of childhood asthma.


Sign in / Sign up

Export Citation Format

Share Document