Genomic Prediction of Complex Disease Risk

Mapping Intimacies ◽

10.1101/506600 ◽

2018 ◽

Cited By ~ 1

Author(s):

Louis Lello ◽

Timothy G. Raben ◽

Soke Yuen Yong ◽

Laurent CAM Tellier ◽

Stephen D.H. Hsu

Keyword(s):

Genomic Prediction ◽

Complex Disease ◽

Disease Risk ◽

Penalized Regression ◽

Case Control ◽

Nucleotide Polymorphisms ◽

Uk Biobank ◽

Rapid Improvement ◽

Control Data ◽

Polygenic Scores

AbstractWe construct risk predictors using polygenic scores (PGS) computed from common Single Nucleotide Polymorphisms (SNPs) for a number of complex disease conditions, using L1-penalized regression (also known as LASSO) on case-control data from UK Biobank. Among the disease conditions studied are Hypothyroidism, (Resistant) Hypertension, Type 1 and 2 Diabetes, Breast Cancer, Prostate Cancer, Testicular Cancer, Gallstones, Glaucoma, Gout, Atrial Fibrillation, High Cholesterol, Asthma, Basal Cell Carcinoma, Malignant Melanoma, and Heart Attack. We obtain values for the area under the receiver operating characteristic curves (AUC) in the range ~ 0.58 – 0.71 using SNP data alone. Substantially higher predictor AUCs are obtained when incorporating additional variables such as age and sex. Some SNP predictors alone are sufficient to identify outliers (e.g., in the 99th percentile of PGS) with 3 – 8 times higher risk than typical individuals. We validate predictors out-of-sample using the eMERGE dataset, and also with different ancestry subgroups within the UK Biobank population. Our results indicate that substantial improvements in predictive power are attainable using training sets with larger case populations. We anticipate rapid improvement in genomic prediction as more case-control data become available for analysis.

Download Full-text

Genomic Prediction of 16 Complex Disease Risks Including Heart Attack, Diabetes, Breast and Prostate Cancer

Scientific Reports ◽

10.1038/s41598-019-51258-x ◽

2019 ◽

Vol 9 (1) ◽

Cited By ~ 5

Author(s):

Louis Lello ◽

Timothy G. Raben ◽

Soke Yuen Yong ◽

Laurent C. A. M. Tellier ◽

Stephen D. H. Hsu

Keyword(s):

Prostate Cancer ◽

Genomic Prediction ◽

Heart Attack ◽

Complex Disease ◽

Penalized Regression ◽

Case Control ◽

Uk Biobank ◽

Rapid Improvement ◽

Control Data ◽

Polygenic Scores

Abstract We construct risk predictors using polygenic scores (PGS) computed from common Single Nucleotide Polymorphisms (SNPs) for a number of complex disease conditions, using L1-penalized regression (also known as LASSO) on case-control data from UK Biobank. Among the disease conditions studied are Hypothyroidism, (Resistant) Hypertension, Type 1 and 2 Diabetes, Breast Cancer, Prostate Cancer, Testicular Cancer, Gallstones, Glaucoma, Gout, Atrial Fibrillation, High Cholesterol, Asthma, Basal Cell Carcinoma, Malignant Melanoma, and Heart Attack. We obtain values for the area under the receiver operating characteristic curves (AUC) in the range ~0.58–0.71 using SNP data alone. Substantially higher predictor AUCs are obtained when incorporating additional variables such as age and sex. Some SNP predictors alone are sufficient to identify outliers (e.g., in the 99th percentile of polygenic score, or PGS) with 3–8 times higher risk than typical individuals. We validate predictors out-of-sample using the eMERGE dataset, and also with different ancestry subgroups within the UK Biobank population. Our results indicate that substantial improvements in predictive power are attainable using training sets with larger case populations. We anticipate rapid improvement in genomic prediction as more case-control data become available for analysis.

Download Full-text

A Note on Penalized Regression Spline Estimation in the Secondary Analysis of Case-Control Data

Statistics in Biosciences ◽

10.1007/s12561-013-9094-9 ◽

2013 ◽

Vol 5 (2) ◽

pp. 250-260 ◽

Cited By ~ 1

Author(s):

Suzan Gazioglu ◽

Jiawei Wei ◽

Elizabeth M. Jennings ◽

Raymond J. Carroll

Keyword(s):

Secondary Analysis ◽

Penalized Regression ◽

Case Control ◽

Control Data ◽

Regression Spline ◽

Spline Estimation ◽

Penalized Regression Spline

Download Full-text

Effects of Agricultural Work and Other Proxy-derived Case-Control Data on Parkinson's Disease Risk Estimates

American Journal of Epidemiology ◽

10.1093/oxfordjournals.aje.a117497 ◽

1995 ◽

Vol 141 (8) ◽

pp. 747-754 ◽

Cited By ~ 18

Author(s):

Karen M. Semchuk ◽

Edgar J. Love

Keyword(s):

Parkinson’S Disease ◽

Parkinson's Disease ◽

Disease Risk ◽

Case Control ◽

Control Data ◽

Agricultural Work ◽

Risk Estimates

Download Full-text

Session 2: Personalised nutrition Genetic variation and disease risk: new advances

Proceedings of The Nutrition Society ◽

10.1017/s0029665109001037 ◽

2009 ◽

Vol 68 (2) ◽

pp. 113-121

Author(s):

John Scott

Keyword(s):

Cystic Fibrosis ◽

Disease Risk ◽

New Technology ◽

Signs And Symptoms ◽

Case Control ◽

Candidate Snps ◽

Nucleotide Polymorphisms ◽

Risk Variant ◽

Individual Snps ◽

Or Genes

Variations in human DNA, most frequently single-nucleotide polymorphisms (SNPs), can have functional consequences ranging from severe to none. Variations in outcome (phenotype) can be compared, from cystic fibrosis through haemochromatosis to general familial risks in, for example, colo-rectal cancer (CRC). Cystic fibrosis and haemochromatosis have severe phenotypes with high penetrance, with signs and symptoms always or mostly present; thus, they have been easy to identify from family studies. However, the familial risks that are known to contribute markedly to CRC are unknown. The sequencing of the human genome has now made possible the identification of these and other disease variants. Knowing the DNA sequence in an idealised individual adds little unless variants that increase (or decrease) disease risk from the norm can be identified. Such variants can be expected to be very common in the general population, but have low penetrance and only change risk to a limited extent. Many patients will not have the risk variant and many ‘normal’ patients will have the risk variant. Thus, very large case–control cohorts are essential. These case–control cohorts can be analysed at three different levels: (1) individual SNPs; (2) individual genes; (3) genome-wide analysis (GWA). Level 1 looks for case–control differences for specific SNPs. Alternatively, new technology can be applied to examine a range of SNPs within a gene to track differences in its regulation as well as in function. Finally, the whole genome with ≥0·5×106 SNPs could be marked. The first two approaches involve selecting ‘candidate’ SNPs or genes, while GWA looks for any variation in the genome that is enriched in the cases. All three approaches carry the certainty that significant associations will be found by statistical chance, for which correction must be made. This latter issue is helped by large numbers and by independent replication cohorts.

Download Full-text

An approach using random forest methodology for disease risk prediction using imbalanced case–control data in GWAS

Current Medicine Research and Practice ◽

10.1016/j.cmrp.2014.11.011 ◽

2014 ◽

Vol 4 (6) ◽

pp. 289-294

Author(s):

Prabina Kumar Meher ◽

Atmakuri Ramakrishna Rao ◽

Sant Dass Wahi ◽

B.K. Thelma

Keyword(s):

Random Forest ◽

Risk Prediction ◽

Disease Risk ◽

Case Control ◽

Control Data

Download Full-text

Heterozygous PRKN mutations are common but do not increase the risk of Parkinson's disease

10.1101/2021.08.11.21261928 ◽

2021 ◽

Author(s):

William Zhu ◽

Xiaoping Huang ◽

Esther Yoon ◽

Sara P Bandres Ciga ◽

Cornelis Blauwendraat ◽

...

Keyword(s):

Disease Risk ◽

Cost Effective ◽

P Value ◽

Nucleotide Polymorphisms ◽

Uk Biobank ◽

Loss Of Function ◽

Genotyping Array ◽

Array Data ◽

Population Scale ◽

The Uk

PRKN mutations are the most common recessive cause of Parkinson′s disease (PD) and are a promising target for gene and cell replacement therapies. Identification of biallelic PRKN patients (PRKN-PD) at the population scale, however, remains a challenge, as roughly half are copy number variants (CNVs) and many single nucleotide polymorphisms (SNPs) are of unclear significance. Additionally, the true prevalence and disease risk associated with heterozygous PRKN mutations is unclear, as a comprehensive assessment of PRKN SNPs and CNVs has not been performed at a population scale. To address these challenges, we evaluated PRKN mutations in 2 cohorts analyzed with both a genotyping array and exome or genome sequencing: the NIH PD cohort, a deeply phenotyped cohort of PD patients, and the UK Biobank, a population scale cohort with nearly half a million participants. Genotyping array identified the majority of PRKN mutations and at least 1 mutation in most biallelic PRKN mutation carriers in both cohorts. Additionally, in the NIH PD cohort, functional assays of patient fibroblasts resolved variants of unclear significance in biallelic carriers and ruled out cryptic loss of function variants in monoallelic carriers. In the UK Biobank, we identified 2,692 PRKN CNVs from genotyping array data from nearly half a million participants (the largest collection to date). Deletions or duplications involving exons 2 accounted for roughly half of all CNVs and the vast majority (88%) involved exons 2, 3, or 4. Combining estimates from whole exome sequencing (from ~200,000 participants) and genotyping array data, we found a pathogenic PRKN mutation in 1.8% of participants and 2 mutations in ~1/7,800 participants. Those with 1 PRKN pathogenic variant were as likely as non-carriers to have PD (OR = 0.91, CI= 0.58 – 1.38, p-value = 0.76) or a parent with PD (OR = 1.12, CI = 0.94 – 1.31, p-value = 0.19). Together our results demonstrate that heterozygous pathogenic PRKN mutations are common in the population but do not increase the risk of PD. Additionally, they suggest a cost-effective framework to screen for biallelic PRKN patients at the population scale for targeted studies.

Download Full-text

Disease risk estimation by combining case-control data with aggregated information on the population at risk

Biometrics ◽

10.1111/biom.12256 ◽

2014 ◽

Vol 71 (1) ◽

pp. 114-121 ◽

Cited By ~ 2

Author(s):

Xiaohui Chang ◽

Rasmus Waagepetersen ◽

Herbert Yu ◽

Xiaomei Ma ◽

Theodore R. Holford ◽

...

Keyword(s):

At Risk ◽

Disease Risk ◽

Risk Estimation ◽

Case Control ◽

Control Data ◽

Population At Risk

Download Full-text

Contrasting Broad- and Clinically- defined Polygenic Indicators of Depression and Depression-related Phenotypes in Adults and Children

10.31234/osf.io/pn9vb ◽

2020 ◽

Author(s):

John E. McGeary ◽

Chelsie Benca-Bachman ◽

Victoria Risner ◽

Christopher G Beevers ◽

Brandon Gibb ◽

...

Keyword(s):

Suicidal Ideation ◽

Cognitive Reappraisal ◽

Twin Studies ◽

European Ancestry ◽

Summary Statistics ◽

Depression Severity ◽

Uk Biobank ◽

Polygenic Scores ◽

Adults And Children ◽

The Uk

Twin studies indicate that 30-40% of the disease liability for depression can be attributed to genetic differences. Here, we assess the explanatory ability of polygenic scores (PGS) based on broad- (PGSBD) and clinical- (PGSMDD) depression summary statistics from the UK Biobank using independent cohorts of adults (N=210; 100% European Ancestry) and children (N=728; 70% European Ancestry) who have been extensively phenotyped for depression and related neurocognitive phenotypes. PGS associations with depression severity and diagnosis were generally modest, and larger in adults than children. Polygenic prediction of depression-related phenotypes was mixed and varied by PGS. Higher PGSBD, in adults, was associated with a higher likelihood of having suicidal ideation, increased brooding and anhedonia, and lower levels of cognitive reappraisal; PGSMDD was positively associated with brooding and negatively related to cognitive reappraisal. Overall, PGS based on both broad and clinical depression phenotypes have modest utility in adult and child samples of depression.

Download Full-text

Household mold exposure interacts with inflammation-related genetic variants on childhood asthma: a case–control study

BMC Pulmonary Medicine ◽

10.1186/s12890-021-01484-9 ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Yu Zhang ◽

Li Hua ◽

Quan-Hua Liu ◽

Shu-Yuan Chu ◽

Yue-Xin Gan ◽

...

Keyword(s):

Childhood Asthma ◽

Case Control Study ◽

Additive Model ◽

Case Control ◽

Nucleotide Polymorphisms ◽

Additive Interaction ◽

Asthmatic Children ◽

Gene Environment ◽

Mold Exposure ◽

Control Study

Abstract Background A number of studies have examined the association between mold exposure and childhood asthma. However, the conclusions were inconsistent, which might be partly attributable to the lack of consideration of gene function, especially the key genes affecting the pathogenesis of childhood asthma. Research on the interactions between genes and mold exposure on childhood asthma is still very limited. We therefore examined whether there is an interaction between inflammation-related genes and mold exposure on childhood asthma. Methods A case–control study with 645 asthmatic children and 910 non-asthmatic children aged 3–12 years old was conducted. Eight single nucleotide polymorphisms (SNPs) in inflammation-related genes were genotyped using MassARRAY assay. Mold exposure was defined as self-reported visible mold on the walls. Associations between visible mold exposure, SNPs and childhood asthma were evaluated using logistic regression models. In addition, crossover analyses were used to estimate the gene-environment interactions on childhood asthma on an additive scale. Results After excluding children without information on visible mold exposure or SNPs, 608 asthmatic and 839 non-asthmatic children were included in the analyses. Visible mold exposure was reported in 151 asthmatic (24.8%) and 119 non-asthmatic children (14.2%) (aOR 2.19, 95% CI 1.62–2.97). The rs7216389 SNP in gasdermin B gene (GSDMB) increased the risk of childhood asthma with each C to T substitution in a dose-dependent pattern (additive model, aOR 1.32, 95% CI 1.11–1.57). Children carrying the rs7216389 T allele and exposed to visible mold dramatically increased the risk of childhood asthma (aOR 3.21; 95% CI 1.77–5.99). The attributable proportion due to the interaction (AP: 0.47, 95% CI 0.03–0.90) and the relative excess risk due to the interaction (RERI: 1.49, 95% CI 0–2.99) were statistically significant. Conclusions In the present study, there was a significant additive interaction between visible mold exposure and rs7216389 SNP on childhood asthma. Future studies need to consider the gene-environment interactions when exploring the risk factors of childhood asthma.

Download Full-text

COMPARISON OF ATHEROSCLEROTIC CARDIOVASCULAR DISEASE RISK PREDICTION BY LIPOPROTEIN(A) LEVELS BETWEEN PERSONS WITH AND WITHOUT PRIOR CARDIOVASCULAR DISEASE: THE UK BIOBANK

Journal of the American College of Cardiology ◽

10.1016/s0735-1097(21)02842-4 ◽

2021 ◽

Vol 77 (18) ◽

pp. 1484

Author(s):

Nathan D. Wong ◽

Yanglu Zhao ◽

Ailin Barseghian El-Farra ◽

Michael Wilkinson

Keyword(s):

Cardiovascular Disease ◽

Risk Prediction ◽

Disease Risk ◽

Cardiovascular Disease Risk ◽

Atherosclerotic Cardiovascular Disease ◽

Uk Biobank ◽

Lipoprotein A ◽

Atherosclerotic Cardiovascular Disease Risk ◽

The Uk

Download Full-text