scholarly journals KDML: a machine-learning framework for inference of multi-scale gene functions from genetic perturbation screens

2019 ◽  
Author(s):  
Heba Z. Sailem ◽  
Jens Rittscher ◽  
Lucas Pelkmans

AbstractCharacterising context-dependent gene functions is crucial for understanding the genetic bases of health and disease. To date, inference of gene functions from large-scale genetic perturbation screens is based on ad-hoc analysis pipelines involving unsupervised clustering and functional enrichment. We present Knowledge-Driven Machine Learning (KDML), a framework that systematically predicts multiple functions for a given gene based on the similarity of its perturbation phenotype to those with known function. As proof of concept, we test KDML on three datasets describing phenotypes at the molecular, cellular and population levels, and show that it outperforms traditional analysis pipelines. In particular, KDML identified an abnormal multicellular organisation phenotype associated with the depletion of olfactory receptors and TGFβ and WNT signalling genes in colorectal cancer cells. We validate these predictions in colorectal cancer patients and show that olfactory receptors expression is predictive of worse patient outcome. These results highlight KDML as a systematic framework for discovering novel scale-crossing and clinically relevant gene functions. KDML is highly generalizable and applicable to various large-scale genetic perturbation screens.

2009 ◽  
Vol 27 (15_suppl) ◽  
pp. e22217-e22217
Author(s):  
T. Salman ◽  
A. Bilici ◽  
B. O. Ustaalioglu ◽  
M. Seker ◽  
B. Sonmez ◽  
...  

e22217 Background: There are many ongoing researchs for novel prognostic factors in colorectal cancers. Increased thromboembolic events were associated with poor prognosis and survival in cancer patients. Thrombin-activated fibrinolysis inhibitor (TAFI), which has inhibitory effects on fibrinolysis, was proven to play a major role in hypercoagulopathy and was reported to reach high blood levels in cancer patients compared to those in the general population. Methods: TAFI levels were measured. The correlation between those levels and clinicopathologic features were analyzed in 82 patients with advanced stage colorectal cancer receiving treatment in our clinic. Results: Eighty-two patients were evaluated. Patients characteristics included 54 males (65.9%), 28 females (34.1%); median age 56.4 (range:24–76). The mean TAFI levels was 198,36±70,01 Ğer yazali and TAFI levels were found to be high in 70% of patients. High levels of TAFI were more common in rectum cancer patients compared with colon cancer patients. There was no significant correlation between TAFI levels and clinicopathologic factors, such as age, sex, body mass index, performance status, number of metastases, grade, vascular invasion, perineural invasion and CEA levels. The TAFI levels of patients receiving bevacizumab (202.1±66.6) were more higher than those no receiving (191,83±76,21), but this association was not statistically significant (p>0.05). Conclusions: Although the statistical analysis proved insignificant in our study, the effect of thromboembolic events on prognosis and survival is well established. Thus, large scale prospective studies are required to determine prognostic factors. No significant financial relationships to disclose.


2019 ◽  
Author(s):  
Dimitrios Vitsios ◽  
Slavé Petrovski

AbstractAccess to large-scale genomics datasets has increased the utility of hypothesis-free genome-wide analyses that result in candidate lists of genes. Often these analyses highlight several gene signals that might contribute to pathogenesis but are insufficiently powered to reach experiment-wide significance. This often triggers a process of laborious evaluation of highly-ranked genes through manual inspection of various public knowledge resources to triage those considered sufficiently interesting for deeper investigation. Here, we introduce a novel multi-dimensional, multi-step machine learning framework to objectively and more holistically assess biological relevance of genes to disease studies, by relying on a plethora of gene-associated annotations. We developed mantis-ml to serve as an automated machine learning (AutoML) framework, following a stochastic semi-supervised learning approach to rank known and novel disease-associated genes through iterative training and prediction sessions of random balanced datasets across the protein-coding exome (n=18,626 genes). We applied this framework on a range of disease-specific areas and as a generic disease likelihood estimator, achieving an average Area Under Curve (AUC) prediction performance of 0.85. Critically, to demonstrate applied utility on exome-wide association studies, we overlapped mantis-ml disease-specific predictions with data from published cohort-level association studies. We retrieved statistically significant enrichment of high mantis-ml predictions among the top-ranked genes from hypothesis-free cohort-level statistics (p<0.05), suggesting the capture of true prioritisation signals. We believe that mantis-ml is a novel easy-to-use tool to support objectively triaging gene discovery and overall enhancing our understanding of complex genotype-phenotype associations.


2021 ◽  
Vol 3 (Supplement_2) ◽  
pp. ii1-ii1
Author(s):  
Niven Narain ◽  
Michael Kiebish ◽  
Vivek Vishnudas ◽  
Vladimir Tolstikov ◽  
Gregory Miller ◽  
...  

Abstract The past decade has been witness to an explosive proliferation of data analytics modalities, all seeking to unravel insight into large-scale data sets. Machine learning and AI methodologies now occupy a central role in analyses of data sets that range in nature from genomics, “omics”, clinical, real-world evidence, and demographic data. Despite advances in data analytics/machine learning, access to complex population level clinical and related datasets, translating information into actionable guidance in human health and disease remains a challenge. Interrogative Biology, a systems biology/AI platform generates an unbiased, data-informed network for identifying targets (disease drivers) and biomarkers for disease interception at the point of transition to dysregulation, preceding clinical phenotype. The data topology is enabled by a systematic acquisition and interrogation of longitudinal bio-samples of clinically annotated human matrices (e.g. blood, urine, saliva, tissues) subjected to comprehensive multi-omic (genomic, proteomics, lipidomics and metabolomics) profiling over time. The molecular profiles are integrated with clinical health information using Bayesian artificial intelligence analytics, bAIcis, to generate causal network maps of overall health. Differentials between “health” and “disease” network maps identifies drivers (targets and biomarkers) of disease and are rapidly validated in orthogonal wet-lab disease specific perturbed model systems. Target information imputed into the bAIcis framework can define therapeutic strategies including identification of existing drugs and bio-actives for corrective response. Using a combination of clinic based sampling and dried blood spot analysis for longitudinal dynamic monitoring of markers of health-disease status provides opportunity for proactive clinical management and intervention for corrective response in advance of major deterioration of health status. Taken together, the approach herein allows for health surveillance based on in-depth biological profiling of alterations in the patient narrative to guide treatment modalities and strategies in a longitudinal and dynamic manner to identify, track, intercept, and arrest human disease.


2021 ◽  
Vol 18 (6) ◽  
pp. 8997-9015
Author(s):  
Ahmed Hammad ◽  
◽  
Mohamed Elshaer ◽  
Xiuwen Tang ◽  
◽  
...  

<abstract> <p>Colorectal cancer (CRC) is one of the most common malignancies worldwide. Biomarker discovery is critical to improve CRC diagnosis, however, machine learning offers a new platform to study the etiology of CRC for this purpose. Therefore, the current study aimed to perform an integrated bioinformatics and machine learning analyses to explore novel biomarkers for CRC prognosis. In this study, we acquired gene expression microarray data from Gene Expression Omnibus (GEO) database. The microarray expressions GSE103512 dataset was downloaded and integrated. Subsequently, differentially expressed genes (DEGs) were identified and functionally analyzed via Gene Ontology (GO) and Kyoto Enrichment of Genes and Genomes (KEGG). Furthermore, protein protein interaction (PPI) network analysis was conducted using the STRING database and Cytoscape software to identify hub genes; however, the hub genes were subjected to Support Vector Machine (SVM), Receiver operating characteristic curve (ROC) and survival analyses to explore their diagnostic values. Meanwhile, TCGA transcriptomics data in Gene Expression Profiling Interactive Analysis (GEPIA) database and the pathology data presented by in the human protein atlas (HPA) database were used to verify our transcriptomic analyses. A total of 105 DEGs were identified in this study. Functional enrichment analysis showed that these genes were significantly enriched in biological processes related to cancer progression. Thereafter, PPI network explored a total of 10 significant hub genes. The ROC curve was used to predict the potential application of biomarkers in CRC diagnosis, with an area under ROC curve (AUC) of these genes exceeding 0.92 suggesting that this risk classifier can discriminate between CRC patients and normal controls. Moreover, the prognostic values of these hub genes were confirmed by survival analyses using different CRC patient cohorts. Our results demonstrated that these 10 differentially expressed hub genes could be used as potential biomarkers for CRC diagnosis.</p> </abstract>


2019 ◽  
Vol 10 (1) ◽  
Author(s):  
K. T. Schütt ◽  
M. Gastegger ◽  
A. Tkatchenko ◽  
K.-R. Müller ◽  
R. J. Maurer

AbstractMachine learning advances chemistry and materials science by enabling large-scale exploration of chemical space based on quantum chemical calculations. While these models supply fast and accurate predictions of atomistic chemical properties, they do not explicitly capture the electronic degrees of freedom of a molecule, which limits their applicability for reactive chemistry and chemical analysis. Here we present a deep learning framework for the prediction of the quantum mechanical wavefunction in a local basis of atomic orbitals from which all other ground-state properties can be derived. This approach retains full access to the electronic structure via the wavefunction at force-field-like efficiency and captures quantum mechanics in an analytically differentiable representation. On several examples, we demonstrate that this opens promising avenues to perform inverse design of molecular structures for targeting electronic property optimisation and a clear path towards increased synergy of machine learning and quantum chemistry.


2019 ◽  
Author(s):  
Taohua Yue ◽  
Jing Zhu ◽  
Xin Wang ◽  
Yisheng Pan ◽  
Yucun Liu ◽  
...  

Abstract Colorectal cancer (CRC) is one of the most deadly gastrointestinal malignancies. The openness of the Cancer Genome Atlas (TCGA) allows us to perform correlation analysis between large-scale transcriptome data and overall survival (OS) of multiple malignancies. Previous literature reports that the infiltration of immune cells and stromal cells in the tumor microenvironment (TME) significantly associate with the prognosis of cancers. Based on the ESTIMATE algorithm, the immune and stromal components in TME can be quantified by immune and stromal scores. To determine the effects of immune and stromal cell associated genes on CRC prognosis, we divided the CRC cases into high- and low-groups based on the immune/stromal scores and identified 999 differentially expressed genes (DEGs). Heatmaps, functional enrichment analysis and protein‐protein interaction (PPIs) networks further indicated that 999 DEGs mainly participated in stromal composition and immune response. Finally, we obtained 56 genes that were significantly associated with CRC prognosis from 999 DEGs and identified the PPIs networks. The role of 41 genes in CRC has been reported in previous literature, and the other 15 genes have never been reported. Therefore, we found 15 novel TME genes associated with CRC prognosis waiting for more researches.


Author(s):  
Aijaz Ahmad Malik ◽  
Warot Chotpatiwetchkul ◽  
Chuleeporn Phanus-umporn ◽  
Chanin Nantasenamat ◽  
Phasit Charoenkwan ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document