scholarly journals Ensemble learning for detecting gene-gene interactions in colorectal cancer

PeerJ ◽  
2018 ◽  
Vol 6 ◽  
pp. e5854 ◽  
Author(s):  
Faramarz Dorani ◽  
Ting Hu ◽  
Michael O. Woods ◽  
Guangju Zhai

Colorectal cancer (CRC) has a high incident rate in both men and women and is affecting millions of people every year. Genome-wide association studies (GWAS) on CRC have successfully revealed common single-nucleotide polymorphisms (SNPs) associated with CRC risk. However, they can only explain a very limited fraction of the disease heritability. One reason may be the common uni-variable analyses in GWAS where genetic variants are examined one at a time. Given the complexity of cancers, the non-additive interaction effects among multiple genetic variants have a potential of explaining the missing heritability. In this study, we employed two powerful ensemble learning algorithms, random forests and gradient boosting machine (GBM), to search for SNPs that contribute to the disease risk through non-additive gene-gene interactions. We were able to find 44 possible susceptibility SNPs that were ranked most significant by both algorithms. Out of those 44 SNPs, 29 are in coding regions. The 29 genes include ARRDC5, DCC, ALK, and ITGA1, which have been found previously associated with CRC, and E2F3 and NID2, which are potentially related to CRC since they have known associations with other types of cancer. We performed pairwise and three-way interaction analysis on the 44 SNPs using information theoretical techniques and found 17 pairwise (p < 0.02) and 16 three-way (p ≤ 0.001) interactions among them. Moreover, functional enrichment analysis suggested 16 functional terms or biological pathways that may help us better understand the etiology of the disease.

2017 ◽  
Vol 2017 ◽  
pp. 1-9 ◽  
Author(s):  
Ying Meng ◽  
Susan Groth ◽  
Jill R. Quinn ◽  
John Bisognano ◽  
Tong Tong Wu

Hypertension tends to perpetuate in families and the heritability of hypertension is estimated to be around 20–60%. So far, the main proportion of this heritability has not been found by single-locus genome-wide association studies. Therefore, the current study explored gene-gene interactions that have the potential to partially fill in the missing heritability. A two-stage discovery-confirmatory analysis was carried out in the Framingham Heart Study cohorts. The first stage was an exhaustive pairwise search performed in 2320 early-onset hypertensive cases with matched normotensive controls from the offspring cohort. Then, identified gene-gene interactions were assessed in an independent set of 694 subjects from the original cohort. Four unique gene-gene interactions were found to be related to hypertension. Three detected genes were recognized by previous studies, and the other 5 loci/genes (MAN1A1, LMO3, NPAP1/SNRPN, DNAL4, and RNA5SP455/KRT8P5) were novel findings, which had no strong main effect on hypertension and could not be easily identified by single-locus genome-wide studies. Also, by including the identified gene-gene interactions, more variance was explained in hypertension. Overall, our study provides evidence that the genome-wide gene-gene interaction analysis has the possibility to identify new susceptibility genes, which can provide more insights into the genetic background of blood pressure regulation.


2015 ◽  
Vol 2015 ◽  
pp. 1-7 ◽  
Author(s):  
Seungyeoun Lee ◽  
Yongkang Kim ◽  
Min-Seok Kwon ◽  
Taesung Park

Genome-wide association studies (GWAS) have extensively analyzed single SNP effects on a wide variety of common and complex diseases and found many genetic variants associated with diseases. However, there is still a large portion of the genetic variants left unexplained. This missing heritability problem might be due to the analytical strategy that limits analyses to only single SNPs. One of possible approaches to the missing heritability problem is to consider identifying multi-SNP effects or gene-gene interactions. The multifactor dimensionality reduction method has been widely used to detect gene-gene interactions based on the constructive induction by classifying high-dimensional genotype combinations into one-dimensional variable with two attributes of high risk and low risk for the case-control study. Many modifications of MDR have been proposed and also extended to the survival phenotype. In this study, we propose several extensions of MDR for the survival phenotype and compare the proposed extensions with earlier MDR through comprehensive simulation studies.


2019 ◽  
Author(s):  
W. David Hill ◽  
Neil M. Davies ◽  
Stuart J. Ritchie ◽  
Nathan G. Skene ◽  
Julien Bryois ◽  
...  

AbstractSocio-economic position (SEP) is a multi-dimensional construct reflecting (and influencing) multiple socio-cultural, physical, and environmental factors. Previous genome-wide association studies (GWAS) using household income as a marker of SEP have shown that common genetic variants account for 11% of its variation. Here, in a sample of 286,301 participants from UK Biobank, we identified 30 independent genome-wide significant loci, 29 novel, that are associated with household income. Using a recently-developed method to meta-analyze data that leverages power from genetically-correlated traits, we identified an additional 120 income-associated loci. These loci showed clear evidence of functional enrichment, with transcriptional differences identified across multiple cortical tissues, in addition to links with GABAergic and serotonergic neurotransmission. We identified neurogenesis and the components of the synapse as candidate biological systems that are linked with income. By combining our GWAS on income with data from eQTL studies and chromatin interactions, 24 genes were prioritized for follow up, 18 of which were previously associated with cognitive ability. Using Mendelian Randomization, we identified cognitive ability as one of the causal, partly-heritable phenotypes that bridges the gap between molecular genetic inheritance and phenotypic consequence in terms of income differences. Significant differences between genetic correlations indicated that, the genetic variants associated with income are related to better mental health than those linked to educational attainment (another commonly-used marker of SEP). Finally, we were able to predict 2.5% of income differences using genetic data alone in an independent sample. These results are important for understanding the observed socioeconomic inequalities in Great Britain today.


2021 ◽  
Author(s):  
Karthik A. Jagadeesh ◽  
Kushal K Dey ◽  
Daniel T. Montoro ◽  
Steven Gazal ◽  
Jesse M Engreitz ◽  
...  

Cellular dysfunction is a hallmark of disease. Genome-wide association studies (GWAS) have provided a powerful means to identify loci and genes contributing to disease risk, but in many cases the related cell types/states through which genes confer disease risk remain unknown. Deciphering such relationships is important both for our understanding of disease, and for developing therapeutic interventions. Here, we introduce a framework for integrating single-cell RNA-seq (scRNA-seq), epigenomic maps and GWAS summary statistics to infer the underlying cell types and processes by which genetic variants influence disease. We analyzed 1.6 million scRNA-seq profiles from 209 individuals spanning 11 tissue types and 6 disease conditions, and constructed gene programs capturing cell types, disease progression in cell types, and cellular processes both within and across cell types. We evaluated these gene programs for disease enrichment by transforming them to SNP annotations with tissue-specific epigenomic maps and computing enrichment scores across 60 diseases and complex traits (average N=297K). The inferred disease enrichments recapitulated known biology and highlighted novel relationships for different conditions, including GABAergic neurons in major depressive disorder (MDD), disease progression programs in M cells in ulcerative colitis, and a disease-specific complement cascade process in multiple sclerosis. Our framework provides a powerful approach for identifying the cell types and cellular processes by which genetic variants influence disease.


2019 ◽  
Vol 49 (1) ◽  
pp. 259-269 ◽  
Author(s):  
Dong Hang ◽  
Amit D Joshi ◽  
Xiaosheng He ◽  
Andrew T Chan ◽  
Manol Jovani ◽  
...  

Abstract Background Increasing evidence suggests that conventional adenomas (CAs) and serrated polyps (SPs) represent two distinct groups of precursor lesions for colorectal cancer (CRC). The influence of common genetic variants on risk of CAs and SPs remain largely unknown. Methods Among 27 426 participants within three prospective cohort studies, we created a weighted genetic risk score (GRS) based on 40 CRC-related single nucleotide polymorphisms (SNPs) identified in previous genome-wide association studies; and we examined the association of GRS (per one standard deviation increment) with risk of CAs, SPs and synchronous CAs and SPs, by multivariable logistic regression. We also analysed individual variants in the secondary analysis. Results During 18–20 years of follow-up, we documented 2952 CAs, 1585 SPs and 794 synchronous CAs and SPs. Higher GRS was associated with increased risk of CAs [odds ratio (OR) = 1.17, 95% confidence interval (CI): 1.12-1.21] and SPs (OR = 1.09, 95% CI: 1.03-1.14), with a stronger association for CAs than SPs (Pheterogeneity=0.01). An even stronger association was found for patients with synchronous CAs and SPs (OR = 1.32), advanced CAs (OR = 1.22) and multiple CAs (OR = 1.25). Different sets of variants were associated with CAs and SPs, with a Spearman correlation coefficient of 0.02 between the ORs associating the 40 SNPs with the two lesions. After correcting for multiple testing, three variants were associated with CAs (rs3802842, rs6983267 and rs7136702) and two with SPs (rs16892766 and rs4779584). Conclusions Common genetic variants play a potential role in the conventional and serrated pathways of CRC. Different sets of variants are identified for the two pathways, further supporting the aetiological heterogeneity of CRC.


2020 ◽  
Vol 49 (4) ◽  
pp. 1246-1256
Author(s):  
Inge Verkouter ◽  
Renée de Mutsert ◽  
Roelof A J Smit ◽  
Stella Trompet ◽  
Frits R Rosendaal ◽  
...  

Abstract Background Body mass index (BMI)-associated loci are used to explore the effects of obesity using Mendelian randomization (MR), but the contribution of individual tissues to risks remains unknown. We aimed to identify tissue-grouped pathways of BMI-associated loci and relate these to cardiometabolic disease using MR analyses. Methods Using Genotype-Tissue Expression (GTEx) data, we performed overrepresentation tests to identify tissue-grouped gene sets based on mRNA-expression profiles from 634 previously published BMI-associated loci. We conducted two-sample MR with inverse-variance-weighted methods, to examine associations between tissue-grouped BMI-associated genetic instruments and type 2 diabetes mellitus (T2DM) and coronary artery disease (CAD), with use of summary-level data from published genome-wide association studies (T2DM: 74 124 cases, 824 006 controls; CAD: 60 801 cases, 123 504 controls). Additionally, we performed MR analyses on T2DM and CAD using randomly sampled sets of 100 or 200 BMI-associated genetic variants. Results We identified 17 partly overlapping tissue-grouped gene sets, of which 12 were brain areas, where BMI-associated genes were differentially expressed. In tissue-grouped MR analyses, all gene sets were similarly associated with increased risks of T2DM and CAD. MR analyses with randomly sampled genetic variants on T2DM and CAD resulted in a distribution of effect estimates similar to tissue-grouped gene sets. Conclusions Overrepresentation tests revealed differential expression of BMI-associated genes in 17 different tissues. However, with our biology-based approach using tissue-grouped MR analyses, we did not identify different risks of T2DM or CAD for the BMI-associated gene sets, which was reflected by similar effect estimates obtained by randomly sampled gene sets.


2011 ◽  
Vol 26 (S2) ◽  
pp. 2097-2097
Author(s):  
K. Domschke

Twin studies propose a strong genetic contribution to the pathogenesis of anxiety disorders with a heritability of about 50%. The dissection of the complex-genetic underpinnings of anxiety disorders requires a multi-level approach using molecular genetic, imaging genetic, (cognitive)-behavioral genetic and pharmacogenetic techniques linking basic and clinical research.The present talk will first give an overview of results from linkage and association studies yielding support for several candidate genes contributing to the genetic risk for anxiety and panic disorder in particular such as the adenosine 2A receptor, the catechol-O-methyltransferase, the neuropeptide S receptor and the serotonin receptor 1A genes. Results from the first genome-wide association studies in the field of anxiety disorders will be discussed. Additionally, studies on gene-environment interactions between anxiety disorder risk variants and environmental factors will be presented. Imaging genetics approaches have yielded evidence for several risk genes to crucially impact activation in brain regions critical for emotional processing. Gene variation has furthermore been found to potentially confer an increased risk for panic disorder via elevated autonomic arousal and dysfunctional cognitions regarding bodily sensations. Finally, there is first evidence for genetic variants impacting treatment response to antidepressant pharmacotherapy in anxiety disorders.Thus, converging lines of evidence will be presented for several candidate genes of anxiety to exert an increased disease risk potentially via a distorted cortico-limbic interaction during emotional processing, increased physiological arousal or dysfunctional cognition. Additionally, a possible impact of genetic variants on pharmacoresponse in anxiety disorders and its potential clinical implications will be discussed.


Mathematics ◽  
2021 ◽  
Vol 9 (23) ◽  
pp. 3083
Author(s):  
Lorena Alonso ◽  
Ignasi Morán ◽  
Cecilia Salvoro ◽  
David Torrents

The identification and characterisation of genomic changes (variants) that can lead to human diseases is one of the central aims of biomedical research. The generation of catalogues of genetic variants that have an impact on specific diseases is the basis of Personalised Medicine, where diagnoses and treatment protocols are selected according to each patient’s profile. In this context, the study of complex diseases, such as Type 2 diabetes or cardiovascular alterations, is fundamental. However, these diseases result from the combination of multiple genetic and environmental factors, which makes the discovery of causal variants particularly challenging at a statistical and computational level. Genome-Wide Association Studies (GWAS), which are based on the statistical analysis of genetic variant frequencies across non-diseased and diseased individuals, have been successful in finding genetic variants that are associated to specific diseases or phenotypic traits. But GWAS methodology is limited when considering important genetic aspects of the disease and has not yet resulted in meaningful translation to clinical practice. This review presents an outlook on the study of the link between genetics and complex phenotypes. We first present an overview of the past and current statistical methods used in the field. Next, we discuss current practices and their main limitations. Finally, we describe the open challenges that remain and that might benefit greatly from further mathematical developments.


2020 ◽  
Author(s):  
Pavel P Kuksa ◽  
Chia-Lun Lui ◽  
Wei Fu ◽  
Liming Qu ◽  
Yi Zhao ◽  
...  

Background: Alzheimer's disease (AD) genetic findings span progressively larger genome-wide association studies (GWASs) for various outcomes and populations. These genetic findings are obtained from a single GWAS, joint- or meta- analyses of multiple GWAS datasets. However, no single resource provides harmonized and searchable information on all AD genetic associations obtained from these analyses, nor linking the identified genetic variants and reported genes with other supporting functional genomic evidence. Methods: We created the Alzheimer's Disease Variant Portal (ADVP), which provides unified access to a uniquely extensive collection of high-quality GWAS association results for AD. Records in ADVP are curated from the genome-wide significant and suggestive loci reported in AD genetics literature. ADVP contains curated results from all AD GWAS publications by Alzheimer's Disease Genetics Consortium (ADGC) since 2009 and AD GWAS publications identified from other public catalogs (GWAS catalog). Genetic association information was systematically extracted from these publications, harmonized, and organized into three types of tables. These tables included structured publication, variant, and association categories to ensure consistent representation of all AD genetic findings. All extracted AD genetic associations were further annotated and integrated with NIAGADS Genomics DB in order to provide extensive biological and functional genomics annotations. Results: Currently, ADVP contains 6,990 AD-association records curated from >200 AD GWAS publications corresponding to >900 unique genomic loci and >1,800 unique genetic variants. The ADVP collection contains genetic findings from >80 cohorts and across various populations, including Caucasians, Hispanics, African-Americans, and Asians. Of all the association records, 46% are disease-risk, 13% are related to expression quantitative trait analyses, and 27% are related to AD endophenotypes and neuropathology. ADVP web interface allows accessing AD association records by individual variants, genes, publications, genomic regions of interest, and genome-wide interactive variant views. ADVP is integrated with the NIAGADS Alzheimer's Genomics Database. Researchers can explore additional biological annotations at the genetic variant or gene level and view cross-reference functional genomics evidence provided by other public resources. Conclusions: ADVP is the largest, most up-to-date, and comprehensive literature-derived collection of AD genetic associations. All records have been systematically curated, harmonized, and comprehensively annotated. ADVP is freely accessible at https://advp.niagads.org/.


Genes ◽  
2020 ◽  
Vol 11 (5) ◽  
pp. 507
Author(s):  
Carolina Bonilla ◽  
Lara Novaes Baccarini

Epidemiology seeks to determine the causal effects of exposures on outcomes related to the health and wellbeing of populations. Observational studies, one of the most commonly used designs in epidemiology, can be biased due to confounding and reverse causation, which makes it difficult to establish causal relationships. In recent times, genetically informed methods, like Mendelian randomization (MR), have been developed in an attempt to overcome these disadvantages. MR relies on the association of genetic variants with outcomes of interest, where the genetic variants are proxies or instruments for modifiable exposures. Because genotypes are sorted independently and at random at the time of conception, they are less prone to confounding and reverse causation. Implementation of MR depends on, among other things, a strong association of the genetic variants with the exposure, which has usually been defined via genome-wide association studies (GWAS). Because GWAS have been most often carried out in European populations, the limited identification of strong instruments in other populations poses a major problem for the application of MR in Latin America. We suggest potential solutions that can be realized with the resources at hand and others that will have to wait for increased funding and access to technology.


Sign in / Sign up

Export Citation Format

Share Document