Heterogeneous Network Edge Prediction: A Data Integration Approach to Prioritize Disease-Associated Genes

Mapping Intimacies ◽

10.1101/011569 ◽

2014 ◽

Cited By ~ 1

Author(s):

Daniel S Himmelstein ◽

Sergio E Baranzini

Keyword(s):

Data Integration ◽

Protein Interactions ◽

Heterogeneous Network ◽

Association Studies ◽

Genome Wide Association Studies ◽

Genetic Associations ◽

Protein Coding ◽

Multiple Node ◽

Pathogenic Variants ◽

Disease Associated Genes

The first decade of Genome Wide Association Studies (GWAS) has uncovered a wealth of disease-associated variants. Two important derivations will be the translation of this information into a multiscale understanding of pathogenic variants, and leveraging existing data to increase the power of existing and future studies through prioritization. We explore edge prediction on heterogeneous networks—graphs with multiple node and edge types—for accomplishing both tasks. First we constructed a network with 18 node types—genes, diseases, tissues, pathophysiologies, and 14 MSigDB (molecular signatures database)collections—and 19 edge types from high-throughput publicly-available resources. From this network composed of 40,343 nodes and 1,608,168 edges, we extracted features that describe the topology between specific genes and diseases. Next, we trained a model from GWAS associations and predicted the probability of association between each protein-coding gene and each of 29 well-studied complex diseases. The model, which achieved 132-fold enrichment in precision at 10% recall, outperformed any individual domain, highlighting the benefit of integrative approaches. We identified pleiotropy, transcriptional signatures of perturbations, pathways, and protein interactions as fundamental mechanisms explaining pathogenesis. Our method successfully predicted the results (with AUROC = 0.79) from a withheld multiple sclerosis (MS) GWAS despite starting with only 13 previously associated genes. Finally, we combined our network predictions with statistical evidence of association to propose four novel MS genes, three of which (JAK2, REL, RUNX3) validated on the masked GWAS. Furthermore, our predictions provide biological support highlighting REL as the causal gene within its gene-rich locus. Users can browse all predictions online (http://het.io). Heterogeneous network edge prediction effectively prioritized genetic associations and provides a powerful new approach for data integration across multiple domains.

Download Full-text

Genetic associations of protein-coding variants in human disease

10.1101/2021.10.14.21265023 ◽

2021 ◽

Author(s):

Benjamin B Sun ◽

Mitja I Kurki ◽

Christopher N Foley ◽

Asma Mechakra ◽

Chia-Yen Chen ◽

...

Keyword(s):

Genetic Variants ◽

Human Disease ◽

Drug Targets ◽

Association Studies ◽

Single Gene ◽

Clinical Stage ◽

Genome Wide Association Studies ◽

Genetic Associations ◽

Protein Coding ◽

Disease Associations

Genome-wide association studies (GWAS) have identified thousands of genetic variants linked to the risk of human disease. However, GWAS have thus far remained largely underpowered to identify associations in the rare and low frequency allelic spectrum and have lacked the resolution to trace causal mechanisms to underlying genes. Here, we combined whole exome sequencing in 392,814 UK Biobank participants with imputed genotypes from 260,405 FinnGen participants (653,219 total individuals) to conduct association meta-analyses for 744 disease endpoints across the protein-coding allelic frequency spectrum, bridging the gap between common and rare variant studies. We identified 975 associations, with more than one-third of our findings not reported previously. We demonstrate population-level relevance for mutations previously ascribed to causing single-gene disorders, map GWAS associations to likely causal genes, explain disease mechanisms, and systematically relate disease associations to levels of 117 biomarkers and clinical-stage drug targets. Combining sequencing and genotyping in two population biobanks allowed us to benefit from increased power to detect and explain disease associations, validate findings through replication and propose medical actionability for rare genetic variants. Our study provides a compendium of protein-coding variant associations for future insights into disease biology and drug discovery.

Download Full-text

The open targets post-GWAS analysis pipeline

Bioinformatics ◽

10.1093/bioinformatics/btaa020 ◽

2020 ◽

Vol 36 (9) ◽

pp. 2936-2937 ◽

Cited By ~ 4

Author(s):

Gareth Peat ◽

William Jones ◽

Michael Nuhn ◽

José Carlos Marugán ◽

William Newell ◽

...

Keyword(s):

Drug Targets ◽

Gene Expression Regulation ◽

Association Studies ◽

Genome Wide Association Studies ◽

Protein Coding ◽

Data Resource ◽

Coding Regions ◽

Genome Wide ◽

Causal Genes ◽

Interactive Data

Abstract Motivation Genome-wide association studies (GWAS) are a powerful method to detect even weak associations between variants and phenotypes; however, many of the identified associated variants are in non-coding regions, and presumably influence gene expression regulation. Identifying potential drug targets, i.e. causal protein-coding genes, therefore, requires crossing the genetics results with functional data. Results We present a novel data integration pipeline that analyses GWAS results in the light of experimental epigenetic and cis-regulatory datasets, such as ChIP-Seq, Promoter-Capture Hi-C or eQTL, and presents them in a single report, which can be used for inferring likely causal genes. This pipeline was then fed into an interactive data resource. Availability and implementation The analysis code is available at www.github.com/Ensembl/postgap and the interactive data browser at postgwas.opentargets.io.

Download Full-text

Penalized partial least squares for pleiotropy

BMC Bioinformatics ◽

10.1186/s12859-021-03968-1 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Camilo Broc ◽

Therese Truong ◽

Benoit Liquet

Keyword(s):

Least Squares ◽

Partial Least Squares ◽

Association Studies ◽

A Priori ◽

Simulated Data ◽

Real Data ◽

Genome Wide Association Studies ◽

Genetic Associations ◽

Multiple Traits ◽

Application Fields

Abstract Background The increasing number of genome-wide association studies (GWAS) has revealed several loci that are associated to multiple distinct phenotypes, suggesting the existence of pleiotropic effects. Highlighting these cross-phenotype genetic associations could help to identify and understand common biological mechanisms underlying some diseases. Common approaches test the association between genetic variants and multiple traits at the SNP level. In this paper, we propose a novel gene- and a pathway-level approach in the case where several independent GWAS on independent traits are available. The method is based on a generalization of the sparse group Partial Least Squares (sgPLS) to take into account groups of variables, and a Lasso penalization that links all independent data sets. This method, called joint-sgPLS, is able to convincingly detect signal at the variable level and at the group level. Results Our method has the advantage to propose a global readable model while coping with the architecture of data. It can outperform traditional methods and provides a wider insight in terms of a priori information. We compared the performance of the proposed method to other benchmark methods on simulated data and gave an example of application on real data with the aim to highlight common susceptibility variants to breast and thyroid cancers. Conclusion The joint-sgPLS shows interesting properties for detecting a signal. As an extension of the PLS, the method is suited for data with a large number of variables. The choice of Lasso penalization copes with architectures of groups of variables and observations sets. Furthermore, although the method has been applied to a genetic study, its formulation is adapted to any data with high number of variables and an exposed a priori architecture in other application fields.

Download Full-text

Genetic association studies of alterations in protein function expose recessive effects on cancer predisposition

Scientific Reports ◽

10.1038/s41598-021-94252-y ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Nadav Brandes ◽

Nathan Linial ◽

Michal Linial

Keyword(s):

Cancer Risk ◽

Protein Function ◽

Association Studies ◽

Genetic Effects ◽

Cancer Predisposition ◽

Genome Wide Association Studies ◽

Genetic Associations ◽

Functional Interpretation ◽

Gene Damage ◽

Genomic Regions

AbstractThe characterization of germline genetic variation affecting cancer risk, known as cancer predisposition, is fundamental to preventive and personalized medicine. Studies of genetic cancer predisposition typically identify significant genomic regions based on family-based cohorts or genome-wide association studies (GWAS). However, the results of such studies rarely provide biological insight or functional interpretation. In this study, we conducted a comprehensive analysis of cancer predisposition in the UK Biobank cohort using a new gene-based method for detecting protein-coding genes that are functionally interpretable. Specifically, we conducted proteome-wide association studies (PWAS) to identify genetic associations mediated by alterations to protein function. With PWAS, we identified 110 significant gene-cancer associations in 70 unique genomic regions across nine cancer types and pan-cancer. In 48 of the 110 PWAS associations (44%), estimated gene damage is associated with reduced rather than elevated cancer risk, suggesting a protective effect. Together with standard GWAS, we implicated 145 unique genomic loci with cancer risk. While most of these genomic regions are supported by external evidence, our results also highlight many novel loci. Based on the capacity of PWAS to detect non-additive genetic effects, we found that 46% of the PWAS-significant cancer regions exhibited exclusive recessive inheritance. These results highlight the importance of recessive genetic effects, without relying on familial studies. Finally, we show that many of the detected genes exert substantial cancer risk in the studied cohort determined by a quantitative functional description, suggesting their relevance for diagnosis and genetic consulting.

Download Full-text

Comorbidities and Susceptibility to COVID-19: A Generalized Gene Set Data Mining Approach

Journal of Clinical Medicine ◽

10.3390/jcm10081666 ◽

2021 ◽

Vol 10 (8) ◽

pp. 1666

Author(s):

Micaela F. Beckman ◽

Farah Bahrani Mougeot ◽

Jean-Luc C. Mougeot

Keyword(s):

Protein Interactions ◽

Association Studies ◽

Meta Analysis ◽

Holistic Approach ◽

Gene Interaction ◽

Snp Analysis ◽

Genome Wide Association Studies ◽

Nucleotide Polymorphisms ◽

Protein Protein Interactions ◽

Data Mining Approach

The COVID-19 pandemic has led to over 2.26 million deaths for almost 104 million confirmed cases worldwide, as of 4 February 2021 (WHO). Risk factors include pre-existing conditions such as cancer, cardiovascular disease, diabetes, and obesity. Although several vaccines have been deployed, there are few alternative anti-viral treatments available in the case of reduced or non-existent vaccine protection. Adopting a long-term holistic approach to cope with the COVID-19 pandemic appears critical with the emergence of novel and more infectious SARS-CoV-2 variants. Our objective was to identify comorbidity-associated single nucleotide polymorphisms (SNPs), potentially conferring increased susceptibility to SARS-CoV-2 infection using a computational meta-analysis approach. SNP datasets were downloaded from a publicly available genome-wide association studies (GWAS) catalog for 141 of 258 candidate COVID-19 comorbidities. Gene-level SNP analysis was performed to identify significant pathways by using the program MAGMA. An SNP annotation program was used to analyze MAGMA-identified genes. Differential gene expression was determined for significant genes across 30 general tissue types using the Functional and Annotation Mapping of GWAS online tool GENE2FUNC. COVID-19 comorbidities (n = 22) from six disease categories were found to have significant associated pathways, validated by Q–Q plots (p < 0.05). Protein–protein interactions of significant (p < 0.05) differentially expressed genes were visualized with the STRING program. Gene interaction networks were found to be relevant to SARS and influenza pathogenesis. In conclusion, we were able to identify the pathways potentially affected by or affecting SARS-CoV-2 infection in underlying medical conditions likely to confer susceptibility and/or the severity of COVID-19. Our findings have implications in future COVID-19 experimental research and treatment development.

Download Full-text

Assessing the contribution of rare-to-common protein-coding variants to circulating metabolic biomarker levels via 412,394 UK Biobank exome sequences

10.1101/2021.12.24.21268381 ◽

2021 ◽

Author(s):

Abhishek Nag ◽

Lawrence Middleton ◽

Ryan S Dhindsa ◽

Dimitrios Vitsios ◽

Eleanor M Wigmore ◽

...

Keyword(s):

Gene Networks ◽

Rare Variants ◽

Association Studies ◽

Low Frequency ◽

Genome Wide Association Studies ◽

Uk Biobank ◽

Protein Coding ◽

The Uk ◽

Metabolic Biomarkers ◽

Coding Variants

Genome-wide association studies have established the contribution of common and low frequency variants to metabolic biomarkers in the UK Biobank (UKB); however, the role of rare variants remains to be assessed systematically. We evaluated rare coding variants for 198 metabolic biomarkers, including metabolites assayed by Nightingale Health, using exome sequencing in participants from four genetically diverse ancestries in the UKB (N=412,394). Gene-level collapsing analysis, that evaluated a range of genetic architectures, identified a total of 1,303 significant relationships between genes and metabolic biomarkers (p<1x10-8), encompassing 207 distinct genes. These include associations between rare non-synonymous variants in GIGYF1 and glucose and lipid biomarkers, SYT7 and creatinine, and others, which may provide insights into novel disease biology. Comparing to a previous microarray-based genotyping study in the same cohort, we observed that 40% of gene-biomarker relationships identified in the collapsing analysis were novel. Finally, we applied Gene-SCOUT, a novel tool that utilises the gene-biomarker association statistics from the collapsing analysis to identify genes having similar biomarker fingerprints and thus expand our understanding of gene networks.

Download Full-text

Prioritization of genes associated with the pathogenesis of leukosis in cattle

Vavilov Journal of Genetics and Breeding ◽

10.18699/vj18.451 ◽

2019 ◽

Vol 22 (8) ◽

pp. 1063-1069 ◽

Cited By ~ 1

Author(s):

N. S. Yudin ◽

N. L. Podkolodnyy ◽

T. A. Agarkova ◽

E. V. Ignatieva

Keyword(s):

Protein Interactions ◽

Genome Wide Association Study ◽

Association Studies ◽

Mammalian Species ◽

Genome Wide Association ◽

Farm Animals ◽

Genome Wide Association Studies ◽

Protein Protein Interactions ◽

Genome Wide ◽

A Genome

Selection by means of genetic markers is a promising approach to the eradication of infectious diseases in farm animals, especially in the absence of eﬀective methods of treatment and prevention. Bovine leukemia virus (BLV) is spread throughout the world and represents one of the biggest problems for the livestock production and food security in Russia. However, recent genome-wide association studies have shown that sensitivity/resistance to BLV is polygenic. The aim of this study was to create a catalog of cattle genes and genes of other mammalian species involved in the pathogenesis of BLV-induced infection and to perform gene prioritization using bioinformatics methods. Based on manually collected information from a range of open sources, a total of 446 genes were included in the catalog of cattle genes and genes of other mammals involved in the pathogenesis of BLV-induced infection. The following criteria were used to prioritize 446 genes from the catalog: (1) the gene is associated with leukemia according to a genome-wide association study; (2) the gene is associated with leukemia according to a case-control study; (3) the role of the gene in leukemia development has been studied using knockout mice; (4) protein-protein interactions exist between the gene-encoded protein and either viral particles or individual viral proteins; (5) the gene is annotated with Gene Ontology terms that are overrepresented for a given list of genes; (6) the gene participates in biological pathways from the KEGG or REACTOME databases, which are over-represented for a given list of genes; (7) the protein encoded by the gene has a high number of protein-protein interactions with proteins encoded by other genes from the catalog. Based on each criterion, a rank was assigned to each gene. Then the ranks were summarized and an overall rank was determined. Prioritization of 446 candidate genes allowed us to identify 5 genes of interest (TNF,LTB,BOLA-DQA1,BOLA-DRB3,ATF2), which can aﬀect the sensitivity/resistance of cattle to leukemia.

Download Full-text

Risk of Breast Cancer Among Carriers of Pathogenic Variants in Breast Cancer Predisposition Genes Varies by Polygenic Risk Score

Journal of Clinical Oncology ◽

10.1200/jco.20.01992 ◽

2021 ◽

pp. JCO.20.01992

Author(s):

Chi Gao ◽

Eric C. Polley ◽

Steven N. Hart ◽

Hongyan Huang ◽

Chunling Hu ◽

...

Keyword(s):

Breast Cancer ◽

General Population ◽

Association Studies ◽

Polygenic Risk Score ◽

Genome Wide Association Studies ◽

Polygenic Risk ◽

First Degree Relatives ◽

Pathogenic Variants ◽

Predisposition Genes ◽

Brca1 Brca2

PURPOSE This study assessed the joint association of pathogenic variants (PVs) in breast cancer (BC) predisposition genes and polygenic risk scores (PRS) with BC in the general population. METHODS A total of 26,798 non-Hispanic white BC cases and 26,127 controls from predominately population-based studies in the Cancer Risk Estimates Related to Susceptibility consortium were evaluated for PVs in BRCA1, BRCA2, ATM, CHEK2, PALB2, BARD1, BRIP1, CDH1, and NF1. PRS based on 105 common variants were created using effect estimates from BC genome-wide association studies; the performance of an overall BC PRS and estrogen receptor–specific PRS were evaluated. The odds of BC based on the PVs and PRS were estimated using penalized logistic regression. The results were combined with age-specific incidence rates to estimate 5-year and lifetime absolute risks of BC across percentiles of PRS by PV status and first-degree family history of BC. RESULTS The estimated lifetime risks of BC among general-population noncarriers, based on 10th and 90th percentiles of PRS, were 9.1%-23.9% and 6.7%-18.2% for women with or without first-degree relatives with BC, respectively. Taking PRS into account, more than 95% of BRCA1, BRCA2, and PALB2 carriers had > 20% lifetime risks of BC, whereas, respectively, 52.5% and 69.7% of ATM and CHEK2 carriers without first-degree relatives with BC, and 78.8% and 89.9% of those with a first-degree relative with BC had > 20% risk. CONCLUSION PRS facilitates personalization of BC risk among carriers of PVs in predisposition genes. Incorporating PRS into BC risk estimation may help identify > 30% of CHEK2 and nearly half of ATM carriers below the 20% lifetime risk threshold, suggesting the addition of PRS may prevent overscreening and enable more personalized risk management approaches.

Download Full-text

Abstract 49: Adipose-Specific Knockout of Trib1 Reduces Plasma Lipids and Diet-Induced Insulin Resistance, and Increases Circulating Adiponectin

Arteriosclerosis Thrombosis and Vascular Biology ◽

10.1161/atvb.37.suppl_1.49 ◽

2017 ◽

Vol 37 (suppl_1) ◽

Author(s):

Mikhaila A Smith ◽

Jian Cui ◽

Sumeet A Kheterpal ◽

Daniel J Rader ◽

Robert C Bauer

Keyword(s):

Body Mass ◽

Plasma Lipids ◽

Association Studies ◽

Genome Wide Association Studies ◽

Genetic Associations ◽

Tissue Mass ◽

Fed State ◽

Plasma Adiponectin ◽

Metabolic Role

Tribbles-1 (TRIB1) was recently identified through genome-wide association studies as a novel mediator of plasma lipids and coronary artery disease in humans. While subsequent in vivo mouse work confirmed a role for hepatic TRIB1 in these associations, little is known about metabolic roles for extra-hepatic Trib1. Interestingly, SNPs near the TRIB1 gene are significantly associated with circulating adiponectin levels in humans, suggesting a metabolic role for adipose TRIB1 . To further investigate this, we generated adipose-specific Trib1 KO mice (Trib1_ASKO) by crossing Trib1 cKO mice to transgenic Adiponectin-Cre mice. Chow-fed Trib1_ASKO mice exhibited no differences in adipose tissue mass and overall body mass as compared to control littermates (N=8/group). However, Trib1_ASKO mice had reduced total (-16.9%, p <0.01), HDL (-16.7%, p <0.01), and non-HDL cholesterol (-17.3%, p =0.068), as well as plasma triglycerides (-28.6%, p <0.001) as compared to WT mice. Trib1_ASKO mice also had increased plasma adiponectin levels, a finding more pronounced in female mice (+33.3%, p <0.001) than in males (+16.4%, p =0.072). Despite this increase, transcript levels of adipoQ were moderately decreased in Trib1_ASKO mice, suggesting a post-transcriptional mode of regulation. Transcript and protein levels of C/EBPα, the best described target of Trib1 and a key regulator of adipogenesis, remained unchanged. To further investigate the metabolic consequences of adipose-specific KO of Trib1 , WT and Trib1_ASKO mice were fed high-fat diet (HFD, 45% kCal fat) for 12 weeks to induce obesity. HFD-fed Trib1_ASKO mice had reduced fasting plasma glucose (-22.3%, p <0.05), insulin (-38.2%, p <0.05), and glucose tolerance (-19.8% AUC, p <0.05) compared to control mice. Body mass and fat mass of HFD-fed Trib1_ASKO mice remained unchanged from WT, and the reductions in plasma lipids and increase in plasma adiponectin persisted in the HFD-fed state. In summary, we present here the first in vivo validation of the human genetic association between TRIB1 and plasma adiponectin, and provide evidence suggesting that adipose TRIB1 contributes to the genetic associations observed in humans between TRIB1 and multiple metabolic parameters.

Download Full-text

Timing of pubertal development and midlife blood pressure in men and women: A Mendelian randomization study

The Journal of Clinical Endocrinology & Metabolism ◽

10.1210/clinem/dgab561 ◽

2021 ◽

Author(s):

Io Ieong Chan ◽

Man Ki Kwok ◽

C Mary Schooling

Keyword(s):

Blood Pressure ◽

Mendelian Randomization ◽

Association Studies ◽

Pubertal Development ◽

Sensitivity Analyses ◽

Age At Menarche ◽

Genome Wide Association Studies ◽

Genetic Associations ◽

Pubertal Maturation ◽

Pubertal Growth

Abstract Introduction Observational studies suggest earlier puberty is associated with higher adulthood blood pressure (BP), but these findings have not been replicated using Mendelian randomization (MR). We examined this question sex-specifically using larger genome-wide association studies (GWAS) with more extensive measures of pubertal timing. Methods We obtained genetic instruments proxying pubertal maturation (age at menarche (AAM) or voice breaking (AVB)) from the largest published GWAS. We applied them to summary sex-specific genetic associations with systolic and diastolic BP z-scores, and self-reported hypertension in women (n=194174) and men (n=167020) from the UK Biobank, using inverse-variance weighting meta-analysis. We conducted sensitivity analyses using other MR methods, including multivariable MR adjusted for childhood obesity proxied by body mass index (BMI). We used late pubertal growth as a validation outcome. Results AAM (beta per one-year later = -0.030 [95% confidence interval (CI) -0.055, -0.005] and AVB (beta -0.058 [95% CI -0.100, -0.015]) were inversely associated with systolic BP independent of childhood BMI, as were diastolic BP (-0.035 [95% CI -0.060, -0.009] for AAM and -0.046 [95% CI -0.089, -0.004] for AVB) and self-reported hypertension (odds ratios 0.89 [95% CI 0.84, 0.95] for AAM and 0.87 [95% CI 0.79, 0.96] for AVB). AAM and AVB were positively associated with late pubertal growth, as expected. The results were robust to sensitivity analysis using other MR methods. Conclusion Timing of pubertal maturation was associated with adulthood BP independent of childhood BMI, highlighting the role of pubertal maturation timing in midlife BP.

Download Full-text