KDML: a machine-learning framework for inference of multi-scale gene functions from genetic perturbation screens

Mapping Intimacies ◽

10.1101/761106 ◽

2019 ◽

Cited By ~ 1

Author(s):

Heba Z. Sailem ◽

Jens Rittscher ◽

Lucas Pelkmans

Keyword(s):

Colorectal Cancer ◽

Machine Learning ◽

Large Scale ◽

Ad Hoc ◽

Olfactory Receptors ◽

Functional Enrichment ◽

Learning Framework ◽

Gene Functions ◽

Health And Disease ◽

Colorectal Cancer Patients

AbstractCharacterising context-dependent gene functions is crucial for understanding the genetic bases of health and disease. To date, inference of gene functions from large-scale genetic perturbation screens is based on ad-hoc analysis pipelines involving unsupervised clustering and functional enrichment. We present Knowledge-Driven Machine Learning (KDML), a framework that systematically predicts multiple functions for a given gene based on the similarity of its perturbation phenotype to those with known function. As proof of concept, we test KDML on three datasets describing phenotypes at the molecular, cellular and population levels, and show that it outperforms traditional analysis pipelines. In particular, KDML identified an abnormal multicellular organisation phenotype associated with the depletion of olfactory receptors and TGFβ and WNT signalling genes in colorectal cancer cells. We validate these predictions in colorectal cancer patients and show that olfactory receptors expression is predictive of worse patient outcome. These results highlight KDML as a systematic framework for discovering novel scale-crossing and clinically relevant gene functions. KDML is highly generalizable and applicable to various large-scale genetic perturbation screens.

Download Full-text

The correlation of thrombin-activated fibrinolysis inhibitor (TAFI) levels and clinicopathologic factors in advanced colorectal cancer patients

Journal of Clinical Oncology ◽

10.1200/jco.2009.27.15_suppl.e22217 ◽

2009 ◽

Vol 27 (15_suppl) ◽

pp. e22217-e22217

Author(s):

T. Salman ◽

A. Bilici ◽

B. O. Ustaalioglu ◽

M. Seker ◽

B. Sonmez ◽

...

Keyword(s):

Colorectal Cancer ◽

Prognostic Factors ◽

Cancer Patients ◽

Large Scale ◽

Performance Status ◽

Blood Levels ◽

Thromboembolic Events ◽

Fibrinolysis Inhibitor ◽

Clinicopathologic Factors ◽

Colorectal Cancer Patients

e22217 Background: There are many ongoing researchs for novel prognostic factors in colorectal cancers. Increased thromboembolic events were associated with poor prognosis and survival in cancer patients. Thrombin-activated fibrinolysis inhibitor (TAFI), which has inhibitory effects on fibrinolysis, was proven to play a major role in hypercoagulopathy and was reported to reach high blood levels in cancer patients compared to those in the general population. Methods: TAFI levels were measured. The correlation between those levels and clinicopathologic features were analyzed in 82 patients with advanced stage colorectal cancer receiving treatment in our clinic. Results: Eighty-two patients were evaluated. Patients characteristics included 54 males (65.9%), 28 females (34.1%); median age 56.4 (range:24–76). The mean TAFI levels was 198,36±70,01 Ğer yazali and TAFI levels were found to be high in 70% of patients. High levels of TAFI were more common in rectum cancer patients compared with colon cancer patients. There was no significant correlation between TAFI levels and clinicopathologic factors, such as age, sex, body mass index, performance status, number of metastases, grade, vascular invasion, perineural invasion and CEA levels. The TAFI levels of patients receiving bevacizumab (202.1±66.6) were more higher than those no receiving (191,83±76,21), but this association was not statistically significant (p>0.05). Conclusions: Although the statistical analysis proved insignificant in our study, the effect of thromboembolic events on prognosis and survival is well established. Thus, large scale prospective studies are required to determine prognostic factors. No significant financial relationships to disclose.

Download Full-text

Stochastic semi-supervised learning to prioritise genes from high-throughput genomic screens

10.1101/655449 ◽

2019 ◽

Author(s):

Dimitrios Vitsios ◽

Slavé Petrovski

Keyword(s):

Machine Learning ◽

Supervised Learning ◽

Large Scale ◽

Association Studies ◽

Protein Coding ◽

Knowledge Resources ◽

Learning Framework ◽

Significant Enrichment ◽

Disease Associated Genes ◽

Disease Specific

AbstractAccess to large-scale genomics datasets has increased the utility of hypothesis-free genome-wide analyses that result in candidate lists of genes. Often these analyses highlight several gene signals that might contribute to pathogenesis but are insufficiently powered to reach experiment-wide significance. This often triggers a process of laborious evaluation of highly-ranked genes through manual inspection of various public knowledge resources to triage those considered sufficiently interesting for deeper investigation. Here, we introduce a novel multi-dimensional, multi-step machine learning framework to objectively and more holistically assess biological relevance of genes to disease studies, by relying on a plethora of gene-associated annotations. We developed mantis-ml to serve as an automated machine learning (AutoML) framework, following a stochastic semi-supervised learning approach to rank known and novel disease-associated genes through iterative training and prediction sessions of random balanced datasets across the protein-coding exome (n=18,626 genes). We applied this framework on a range of disease-specific areas and as a generic disease likelihood estimator, achieving an average Area Under Curve (AUC) prediction performance of 0.85. Critically, to demonstrate applied utility on exome-wide association studies, we overlapped mantis-ml disease-specific predictions with data from published cohort-level association studies. We retrieved statistically significant enrichment of high mantis-ml predictions among the top-ranked genes from hypothesis-free cohort-level statistics (p<0.05), suggesting the capture of true prioritisation signals. We believe that mantis-ml is a novel easy-to-use tool to support objectively triaging gene discovery and overall enhancing our understanding of complex genotype-phenotype associations.

Download Full-text

CSAO-1. Interrogative Biology: Unraveling insights into causal disease drivers by use of a dynamic systems biology and Bayesian AI to identify the intersect of disease and healthy signatures

Neuro-Oncology Advances ◽

10.1093/noajnl/vdab070.001 ◽

2021 ◽

Vol 3 (Supplement_2) ◽

pp. ii1-ii1

Author(s):

Niven Narain ◽

Michael Kiebish ◽

Vivek Vishnudas ◽

Vladimir Tolstikov ◽

Gregory Miller ◽

...

Keyword(s):

Machine Learning ◽

Systems Biology ◽

Data Analytics ◽

Large Scale ◽

Model Systems ◽

Dynamic Monitoring ◽

Treatment Modalities ◽

Data Sets ◽

Health And Disease ◽

Corrective Response

Abstract The past decade has been witness to an explosive proliferation of data analytics modalities, all seeking to unravel insight into large-scale data sets. Machine learning and AI methodologies now occupy a central role in analyses of data sets that range in nature from genomics, “omics”, clinical, real-world evidence, and demographic data. Despite advances in data analytics/machine learning, access to complex population level clinical and related datasets, translating information into actionable guidance in human health and disease remains a challenge. Interrogative Biology, a systems biology/AI platform generates an unbiased, data-informed network for identifying targets (disease drivers) and biomarkers for disease interception at the point of transition to dysregulation, preceding clinical phenotype. The data topology is enabled by a systematic acquisition and interrogation of longitudinal bio-samples of clinically annotated human matrices (e.g. blood, urine, saliva, tissues) subjected to comprehensive multi-omic (genomic, proteomics, lipidomics and metabolomics) profiling over time. The molecular profiles are integrated with clinical health information using Bayesian artificial intelligence analytics, bAIcis, to generate causal network maps of overall health. Differentials between “health” and “disease” network maps identifies drivers (targets and biomarkers) of disease and are rapidly validated in orthogonal wet-lab disease specific perturbed model systems. Target information imputed into the bAIcis framework can define therapeutic strategies including identification of existing drugs and bio-actives for corrective response. Using a combination of clinic based sampling and dried blood spot analysis for longitudinal dynamic monitoring of markers of health-disease status provides opportunity for proactive clinical management and intervention for corrective response in advance of major deterioration of health status. Taken together, the approach herein allows for health surveillance based on in-depth biological profiling of alterations in the patient narrative to guide treatment modalities and strategies in a longitudinal and dynamic manner to identify, track, intercept, and arrest human disease.

Download Full-text

Identification of potential biomarkers with colorectal cancer based on bioinformatics analysis and machine learning

Mathematical Biosciences and Engineering ◽

10.3934/mbe.2021443 ◽

2021 ◽

Vol 18 (6) ◽

pp. 8997-9015

Author(s):

Ahmed Hammad ◽

◽

Mohamed Elshaer ◽

Xiuwen Tang ◽

◽

...

Keyword(s):

Gene Expression ◽

Colorectal Cancer ◽

Machine Learning ◽

Roc Curve ◽

Functional Enrichment ◽

Differentially Expressed ◽

Ppi Network ◽

Hub Genes ◽

Survival Analyses ◽

Potential Biomarkers

<abstract> <p>Colorectal cancer (CRC) is one of the most common malignancies worldwide. Biomarker discovery is critical to improve CRC diagnosis, however, machine learning offers a new platform to study the etiology of CRC for this purpose. Therefore, the current study aimed to perform an integrated bioinformatics and machine learning analyses to explore novel biomarkers for CRC prognosis. In this study, we acquired gene expression microarray data from Gene Expression Omnibus (GEO) database. The microarray expressions GSE103512 dataset was downloaded and integrated. Subsequently, differentially expressed genes (DEGs) were identified and functionally analyzed via Gene Ontology (GO) and Kyoto Enrichment of Genes and Genomes (KEGG). Furthermore, protein protein interaction (PPI) network analysis was conducted using the STRING database and Cytoscape software to identify hub genes; however, the hub genes were subjected to Support Vector Machine (SVM), Receiver operating characteristic curve (ROC) and survival analyses to explore their diagnostic values. Meanwhile, TCGA transcriptomics data in Gene Expression Profiling Interactive Analysis (GEPIA) database and the pathology data presented by in the human protein atlas (HPA) database were used to verify our transcriptomic analyses. A total of 105 DEGs were identified in this study. Functional enrichment analysis showed that these genes were significantly enriched in biological processes related to cancer progression. Thereafter, PPI network explored a total of 10 significant hub genes. The ROC curve was used to predict the potential application of biomarkers in CRC diagnosis, with an area under ROC curve (AUC) of these genes exceeding 0.92 suggesting that this risk classifier can discriminate between CRC patients and normal controls. Moreover, the prognostic values of these hub genes were confirmed by survival analyses using different CRC patient cohorts. Our results demonstrated that these 10 differentially expressed hub genes could be used as potential biomarkers for CRC diagnosis.</p> </abstract>

Download Full-text

Unifying machine learning and quantum chemistry with a deep neural network for molecular wavefunctions

Nature Communications ◽

10.1038/s41467-019-12875-2 ◽

2019 ◽

Vol 10 (1) ◽

Cited By ~ 64

Author(s):

K. T. Schütt ◽

M. Gastegger ◽

A. Tkatchenko ◽

K.-R. Müller ◽

R. J. Maurer

Keyword(s):

Machine Learning ◽

Quantum Chemistry ◽

Degrees Of Freedom ◽

Large Scale ◽

Materials Science ◽

Chemical Space ◽

Chemical Properties ◽

Molecular Structures ◽

Learning Framework ◽

Molecular Wavefunctions

AbstractMachine learning advances chemistry and materials science by enabling large-scale exploration of chemical space based on quantum chemical calculations. While these models supply fast and accurate predictions of atomistic chemical properties, they do not explicitly capture the electronic degrees of freedom of a molecule, which limits their applicability for reactive chemistry and chemical analysis. Here we present a deep learning framework for the prediction of the quantum mechanical wavefunction in a local basis of atomic orbitals from which all other ground-state properties can be derived. This approach retains full access to the electronic structure via the wavefunction at force-field-like efficiency and captures quantum mechanics in an analytically differentiable representation. On several examples, we demonstrate that this opens promising avenues to perform inverse design of molecular structures for targeting electronic property optimisation and a clear path towards increased synergy of machine learning and quantum chemistry.

Download Full-text

Bioinformatics analysis of prognostic value genes in colorectal cancer microenvironment

10.21203/rs.2.16191/v1 ◽

2019 ◽

Author(s):

Taohua Yue ◽

Jing Zhu ◽

Xin Wang ◽

Yisheng Pan ◽

Yucun Liu ◽

...

Keyword(s):

Colorectal Cancer ◽

Large Scale ◽

Enrichment Analysis ◽

Functional Enrichment Analysis ◽

Previous Literature ◽

The Cancer Genome Atlas ◽

Functional Enrichment ◽

Gastrointestinal Malignancies ◽

Protein Protein Interaction ◽

Cancer Genome Atlas

Abstract Colorectal cancer (CRC) is one of the most deadly gastrointestinal malignancies. The openness of the Cancer Genome Atlas (TCGA) allows us to perform correlation analysis between large-scale transcriptome data and overall survival (OS) of multiple malignancies. Previous literature reports that the infiltration of immune cells and stromal cells in the tumor microenvironment (TME) significantly associate with the prognosis of cancers. Based on the ESTIMATE algorithm, the immune and stromal components in TME can be quantified by immune and stromal scores. To determine the effects of immune and stromal cell associated genes on CRC prognosis, we divided the CRC cases into high- and low-groups based on the immune/stromal scores and identified 999 differentially expressed genes (DEGs). Heatmaps, functional enrichment analysis and protein‐protein interaction (PPIs) networks further indicated that 999 DEGs mainly participated in stromal composition and immune response. Finally, we obtained 56 genes that were significantly associated with CRC prognosis from 999 DEGs and identified the PPIs networks. The role of 41 genes in CRC has been reported in previous literature, and the other 15 genes have never been reported. Therefore, we found 15 novel TME genes associated with CRC prognosis waiting for more researches.

Download Full-text

StackHCV: a web-based integrative machine-learning framework for large-scale identification of hepatitis C virus NS5B inhibitors

Journal of Computer-Aided Molecular Design ◽

10.1007/s10822-021-00418-1 ◽

2021 ◽

Author(s):

Aijaz Ahmad Malik ◽

Warot Chotpatiwetchkul ◽

Chuleeporn Phanus-umporn ◽

Chanin Nantasenamat ◽

Phasit Charoenkwan ◽

...

Keyword(s):

Machine Learning ◽

Hepatitis C Virus ◽

Hepatitis C ◽

Large Scale ◽

Web Based ◽

Learning Framework

Download Full-text

Abstract 1898: Accurate modeling of antigen processing and MHC peptide presentation using large-scale immunopeptidomes and a novel machine learning framework

10.1158/1538-7445.am2021-1898 ◽

2021 ◽

Author(s):

Rachel Marty Pyke ◽

Dattatreya Mellacheruvu ◽

Steven Dea ◽

Charles Abbott ◽

Nick Phillips ◽

...

Keyword(s):

Machine Learning ◽

Antigen Processing ◽

Large Scale ◽

Learning Framework ◽

Peptide Presentation

Download Full-text

KCML : a machine‐learning framework for inference of multi‐scale gene functions from genetic perturbation screens

Molecular Systems Biology ◽

10.15252/msb.20199083 ◽

2020 ◽

Vol 16 (3) ◽

Cited By ~ 2

Author(s):

Heba Z Sailem ◽

Jens Rittscher ◽

Lucas Pelkmans

Keyword(s):

Machine Learning ◽

Learning Framework ◽

Multi Scale ◽

Gene Functions

Download Full-text

FOLFOX treatment response prediction in metastatic or recurrent colorectal cancer patients via machine learning algorithms

Cancer Medicine ◽

10.1002/cam4.2786 ◽

2020 ◽

Vol 9 (4) ◽

pp. 1419-1429 ◽

Cited By ~ 3

Author(s):

Wei Lu ◽

Dongliang Fu ◽

Xiangxing Kong ◽

Zhiheng Huang ◽

Maxwell Hwang ◽

...

Keyword(s):

Colorectal Cancer ◽

Machine Learning ◽

Treatment Response ◽

Cancer Patients ◽

Learning Algorithms ◽

Response Prediction ◽

Machine Learning Algorithms ◽

Recurrent Colorectal Cancer ◽

Colorectal Cancer Patients ◽

Treatment Response Prediction

Download Full-text