Prediction of drug targets for specific diseases leveraging gene perturbation data: A machine learning approach

Machine learning prediction of oncology drug targets based on protein and network properties

10.21203/rs.2.15798/v1 ◽

2019 ◽

Author(s):

Zoltan Dezso ◽

Michele Ceccarelli

Keyword(s):

Machine Learning ◽

Clinical Trial ◽

Drug Target ◽

Drug Targets ◽

Validation Dataset ◽

Learning Approach ◽

Biological Functions ◽

Machine Learning Approach ◽

Network Properties ◽

Trial Drug

Abstract Background The selection and prioritization of drug targets is a central problem in drug discovery. Computational approaches can leverage the growing number of large-scale human genomics and proteomics data to make in-silico target identification, reducing the cost and the time needed. Results We developed a machine learning approach to score proteins to generate a druggability score of novel targets. In our model we incorporated 70 protein features which included properties derived from the sequence, features characterizing protein functions as well as network properties derived from the protein-protein interaction network. The advantage of this approach is that it is unbiased and even less studied proteins with limited information about their function can score well as most of the features are independent of the accumulated literature. We build models on a training set which consist of targets with approved drugs and a negative set of non-drug targets. The machine learning techniques help to identify the most important combination of features differentiating validated targets from non-targets. We validated our predictions on an independent set of clinical trial drug targets, achieving a high accuracy characterized by an AUC of 0.89. Our most predictive features included biological function of proteins, network centrality measures, protein essentiality, tissue specificity, localization and solvent accessibility. Our predictions, based on a small set of 102 validated oncology targets, recovered the majority of known drug targets and identifies a novel set of proteins as drug target candidates. Conclusions We developed a machine learning approach to prioritize proteins according to their similarity to approved drug targets. We have shown that the method proposed is highly predictive on a validation dataset consisting of 277 targets of clinical trial drug confirming that our computational approach is an efficient and cost-effective tool for drug target discovery and prioritization. Our predictions were based on oncology targets and cancer relevant biological functions, resulting in significantly higher scores for targets of oncology clinical trial drugs compared to the scores of targets of trial drugs for other indications. Our approach can be used to make indication specific drug-target prediction by combining generic druggability features with indication specific biological functions.

Download Full-text

A Machine Learning Approach for Tracing Tumor Original Sites With Gene Expression Profiles

Frontiers in Bioengineering and Biotechnology ◽

10.3389/fbioe.2020.607126 ◽

2020 ◽

Vol 8 ◽

Author(s):

Xin Liang ◽

Wen Zhu ◽

Bo Liao ◽

Bo Wang ◽

Jialiang Yang ◽

...

Keyword(s):

Machine Learning ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Metastatic Tumor ◽

Unknown Primary ◽

Clinical Practices ◽

Tumor Tissues ◽

Machine Learning Approach ◽

Treatment Plans ◽

Metastatic Sites

Some carcinomas show that one or more metastatic sites appear with unknown origins. The identification of primary or metastatic tumor tissues is crucial for physicians to develop precise treatment plans for patients. With unknown primary origin sites, it is challenging to design specific plans for patients. Usually, those patients receive broad-spectrum chemotherapy, while still having poor prognosis though. Machine learning has been widely used and already achieved significant advantages in clinical practices. In this study, we classify and predict a large number of tumor samples with uncertain origins by applying the random forest and Naive Bayesian algorithms. We use the precision, recall, and other measurements to evaluate the performance of our approach. The results have showed that the prediction accuracy of this method was 90.4 for 7,713 samples. The accuracy was 80% for 20 metastatic tumors samples. In addition, the 10-fold cross-validation is used to evaluate the accuracy of classification, which reaches 91%.

Download Full-text

Evaluation of circulating miRNAs for earlier cancer detection through machine-learning expression profiling.

Journal of Clinical Oncology ◽

10.1200/jco.2020.38.15_suppl.1559 ◽

2020 ◽

Vol 38 (15_suppl) ◽

pp. 1559-1559

Author(s):

Jason Chia-Hsun Hsieh ◽

Chun-Ta Liao ◽

Hung-Ming Wang ◽

Ming-Yu Lien ◽

Yung-Chang Lin ◽

...

Keyword(s):

Machine Learning ◽

Cancer Detection ◽

Mirna Expression ◽

Detection System ◽

Expression Profiles ◽

Expression Patterns ◽

Prediction Algorithm ◽

Support Vector ◽

Circulating Mirnas ◽

Cancer Prediction

1559 Background: Earlier cancer diagnosis leads to higher patient survival rate and reduces financial burdens for patients and their families. Over the past five years, liquid biopsy has demonstrated tremendous promise in the early detection of tumor presence. In addition to circulating tumor cells and circulating tumor DNAs, extracellular microRNAs (miRNAs) have also been shown to be promising diagnostic biomarkers. Through machine-learning profiling, we sought to determine whether or not we could use individuals’ miRNA expression to distinguish between healthy subjects and cancer patients. Methods: Blood samples were collected from healthy donors and from patients of various cancer types. Plasma samples were purified within two hours of sample collection, followed by miRNA extraction. After performing reverse transcription of miRNAs into cDNAs, expression analysis of miRNAs was done using a novel multi-gene, amplification-based detection system that simultaneously analyzes over 160 miRNAs. For subsequent data processing, miRNAs without amplification signals across all profiles were first removed, resulting in 135 miRNAs. These 135 resulting miRNAs were then used as features in Support Vector Machine (SVM) to build OncoSweep classifier, a proprietary prediction algorithm for classification of the samples. Ten-fold cross validation was used to evaluate the performance of OncoSweep. Results: 344 healthy donor samples and 417 cancer patient samples were collected for the study. The prediction algorithm, OncoSweep, was derived based on the miRNA expression patterns of the healthy and the patient samples. The algorithm scored an overall accuracy for cancer prediction of 86.47%, with a sensitivity of 91.4%, a specificity of 85%, a PPV of 85% and an NPV of 88.5%. Conclusions: Utilizing machine-learning method of analyzing circulating miRNA expression profiles, the derived algorithm OncoSweep shows significant promise in cancer prediction. Validation is currently being performed in a larger study. We believe circulating miRNAs, through stringent sample processing and machine-learning methodology, are powerful biomarkers for earlier cancer detection.

Download Full-text

A machine learning approach to define antimalarial drug action from heterogeneous cell-based screens

Science Advances ◽

10.1126/sciadv.aba9338 ◽

2020 ◽

Vol 6 (39) ◽

pp. eaba9338 ◽

Cited By ~ 1

Author(s):

George W. Ashdown ◽

Michelle Dimon ◽

Minjie Fan ◽

Fernando Sánchez-Román Terán ◽

Kathrin Witmer ◽

...

Keyword(s):

Machine Learning ◽

Mechanism Of Action ◽

Training Data ◽

Supervised Machine Learning ◽

Cross Resistance ◽

Learning Approach ◽

Imaging Data ◽

Drug Induced ◽

Effective Prevention ◽

Machine Learning Approach

Drug resistance threatens the effective prevention and treatment of an ever-increasing range of human infections. This highlights an urgent need for new and improved drugs with novel mechanisms of action to avoid cross-resistance. Current cell-based drug screens are, however, restricted to binary live/dead readouts with no provision for mechanism of action prediction. Machine learning methods are increasingly being used to improve information extraction from imaging data. These methods, however, work poorly with heterogeneous cellular phenotypes and generally require time-consuming human-led training. We have developed a semi-supervised machine learning approach, combining human- and machine-labeled training data from mixed human malaria parasite cultures. Designed for high-throughput and high-resolution screening, our semi-supervised approach is robust to natural parasite morphological heterogeneity and correctly orders parasite developmental stages. Our approach also reproducibly detects and clusters drug-induced morphological outliers by mechanism of action, demonstrating the potential power of machine learning for accelerating cell-based drug discovery.

Download Full-text

An integrative machine learning approach for prediction of toxicity-related drug safety

Life Science Alliance ◽

10.26508/lsa.201800098 ◽

2018 ◽

Vol 1 (6) ◽

pp. e201800098 ◽

Cited By ~ 8

Author(s):

Artem Lysenko ◽

Alok Sharma ◽

Keith A Boroevich ◽

Tatsuhiko Tsunoda

Keyword(s):

Machine Learning ◽

Drug Toxicity ◽

Drug Targets ◽

Drug Approval ◽

Machine Learning Approach ◽

Related Drug ◽

Low Toxicity ◽

Recent Trends ◽

Toxic Drugs ◽

Target Sets

Recent trends in drug development have been marked by diminishing returns caused by the escalating costs and falling rates of new drug approval. Unacceptable drug toxicity is a substantial cause of drug failure during clinical trials and the leading cause of drug withdraws after release to the market. Computational methods capable of predicting these failures can reduce the waste of resources and time devoted to the investigation of compounds that ultimately fail. We propose an original machine learning method that leverages identity of drug targets and off-targets, functional impact score computed from Gene Ontology annotations, and biological network data to predict drug toxicity. We demonstrate that our method (TargeTox) can distinguish potentially idiosyncratically toxic drugs from safe drugs and is also suitable for speculative evaluation of different target sets to support the design of optimal low-toxicity combinations.

Download Full-text

Machine learning prediction of oncology drug targets based on protein and network properties

10.21203/rs.2.15798/v2 ◽

2019 ◽

Author(s):

Zoltan Dezso ◽

Michele Ceccarelli

Keyword(s):

Machine Learning ◽

Clinical Trial ◽

Drug Target ◽

Drug Targets ◽

Validation Dataset ◽

Learning Approach ◽

Biological Functions ◽

Machine Learning Approach ◽

Network Properties ◽

Trial Drug

Abstract Background The selection and prioritization of drug targets is a central problem in drug discovery. Computational approaches can leverage the growing number of large-scale human genomics and proteomics data to make in-silico target identification, reducing the cost and the time needed. Results We developed a machine learning approach to score proteins to generate a druggability score of novel targets. In our model we incorporated 70 protein features which included properties derived from the sequence, features characterizing protein functions as well as network properties derived from the protein-protein interaction network. The advantage of this approach is that it is unbiased and even less studied proteins with limited information about their function can score well as most of the features are independent of the accumulated literature. We build models on a training set which consist of targets with approved drugs and a negative set of non-drug targets. The machine learning techniques help to identify the most important combination of features differentiating validated targets from non-targets. We validated our predictions on an independent set of clinical trial drug targets, achieving a high accuracy characterized by an AUC of 0.89. Our most predictive features included biological function of proteins, network centrality measures, protein essentiality, tissue specificity, localization and solvent accessibility. Our predictions, based on a small set of 102 validated oncology targets, recovered the majority of known drug targets and identifies a novel set of proteins as drug target candidates. Conclusions We developed a machine learning approach to prioritize proteins according to their similarity to approved drug targets. We have shown that the method proposed is highly predictive on a validation dataset consisting of 277 targets of clinical trial drug confirming that our computational approach is an efficient and cost-effective tool for drug target discovery and prioritization. Our predictions were based on oncology targets and cancer relevant biological functions, resulting in significantly higher scores for targets of oncology clinical trial drugs compared to the scores of targets of trial drugs for other indications. Our approach can be used to make indication specific drug-target prediction by combining generic druggability features with indication specific biological functions.

Download Full-text

Consensus Machine Learning for Gene Target Selection in Pediatric AML Risk

10.1101/632166 ◽

2019 ◽

Author(s):

Jenny Smith ◽

Sean K. Maden ◽

David Lee ◽

Ronald Buie ◽

Vikas Peddu ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Expression Patterns ◽

Specific Gene ◽

Molecular Heterogeneity ◽

Learning Approach ◽

Rna Seq ◽

Gene Target ◽

Machine Learning Approach ◽

Pediatric Aml

AbstractAcute myeloid leukemia (AML) is a cancer of hematopoietic systems that poses high population burden, especially among pediatric populations. AML presents with high molecular heterogeneity, complicating patient risk stratification and treatment planning. While molecular and cytogenetic subtypes of AML are well described, significance of subtype-specific gene expression patterns is poorly understood and effective modeling of these patterns with individual algorithms is challenging. Using a novel consensus machine learning approach, we analyzed public RNA-seq and clinical data from pediatric AML patients (N = 137 patients) enrolled in the TARGET project.We used a binary risk classifier (Low vs. Not-Low Risk) to study risk-specific expression patterns in pediatric AML. We applied the following workflow to identify important gene targets from RNA-seq data: (1) Reduce data dimensionality by identification of differentially expressed genes for AML risk (N = 1984 loci); (2) Optimize algorithm hyperparameters for each of 4 algorithm types (lasso, XGBoost, random forest, and SVM); (3) Study ablation test results for penalized methods (lasso and XGBoost); (4) Bootstrap Boruta permutations with a novel consensus importance metric.We observed recurrently selected features across hyperparameter optimizations, ablation tests, and Boruta permutation bootstrap iterations, including HOXA9 and putative cofactors including MEIS1. Consensus feature selection from Boruta bootstraps identified a larger gene set than single penalized algorithm runs (lasso or XGBoost), while also including correlated and predictive genes from ablation tests.We present a consensus machine learning approach to identify gene targets of likely importance for pediatric AML risk. The approach identified a moderately sized set of recurrent important genes from across 4 algorithm types, including genes identified across ablation tests with penalized algorithms (HOXA9 and MEIS1). Our approach mitigates exclusion biases of penalized algorithms (lasso and XGBoost) while obviating arbitrary importance cutoffs for other types (SVM and random forest). This approach is readily generalizable for research of other heterogeneous diseases, single-assay experiments, and high-dimensional data. Resources and code to recreate our findings are available online.

Download Full-text

Prediction of ligand-receptor pharmacological activities using a combined docking and machine learning approach

10.1101/2021.03.18.434755 ◽

2021 ◽

Author(s):

Mireia Jimenez-Roses ◽

Bradley A Morgan ◽

Maria Jimenez Sigstad ◽

T.D. Zoe Tran ◽

Rohini Srivastava ◽

...

Keyword(s):

Machine Learning ◽

Drug Targets ◽

Binding Pocket ◽

G Protein Coupled Receptors ◽

Antagonist Activity ◽

Pharmacological Activities ◽

Large Dataset ◽

Machine Learning Approach ◽

Receptor Interactions ◽

G Protein Coupled

G protein coupled receptors (GPCRs) form one of the largest families of proteins in humans, and are valuable therapeutic targets for a variety of different diseases. One central question of drug discovery surrounding GPCRs is what determines the agonism or antagonism exhibited by ligands which bind these important targets. Ligands exert their action via the interactions they make in the ligand binding pocket. We hypothesised that there is a common set of receptor interactions made by ligands of diverse structures that mediate their action. We reasoned that among a large dataset of different ligands, the functionally important interactions will be over-represented. To investigate this hypothesis, we assembled a database of ~2700 known β2AR ligands and computationally docked them to multiple experimentally determined β2AR structures, generating ca 75,000 docking poses. For each docking pose, we predicted all interactions between the atoms of the receptor and the atoms of the ligand. Using Machine Learning (ML) we identified specific interactions that correlated with the agonist or antagonist activity of these ligands, and developed ML-based predictors of agonist/antagonist activity with up to 90% accuracy. This approach can be readily applied to other GPCRs and drug targets beyond GPCRs.

Download Full-text

Using Drug Expression Profiles and Machine Learning Approach for Drug Repurposing

Methods in Molecular Biology - Computational Methods for Drug Repurposing ◽

10.1007/978-1-4939-8955-3_13 ◽

2018 ◽

pp. 219-237 ◽

Cited By ~ 5

Author(s):

Kai Zhao ◽

Hon-Cheong So

Keyword(s):

Machine Learning ◽

Expression Profiles ◽

Drug Repurposing ◽

Learning Approach ◽

Machine Learning Approach

Download Full-text

Machine learning guided association of adverse drug reactions with in vitro target-based pharmacology

10.1101/750950 ◽

2019 ◽

Author(s):

Robert Ietswaart ◽

Seda Arat ◽

Amanda X. Chen ◽

Saman Farahmand ◽

Bumjun Kim ◽

...

Keyword(s):

Machine Learning ◽

Adverse Drug Reactions ◽

Drug Targets ◽

Bile Acid Metabolism ◽

Drug Reactions ◽

Machine Learning Approach ◽

Forest Models ◽

Random Forest Models ◽

Model Features

AbstractAdverse drug reactions (ADRs) are one of the leading causes of morbidity and mortality in health care. Understanding which drug targets are linked to ADRs can lead to the development of safer medicines. Here, we analyze in vitro secondary pharmacology of common (off) targets for 2134 marketed drugs. To associate these drugs with human ADRs, we utilized FDA Adverse Event Reports and developed random forest models that predict ADR occurrences from in vitro pharmacological profiles. By evaluating Gini importance scores of model features, we identify 221 target-ADR associations, which co-occur in PubMed abstracts to a greater extent than expected by chance. Among these are established relations, such as the association of in vitro hERG binding with cardiac arrhythmias, which further validate our machine learning approach. Evidence on bile acid metabolism supports our identification of associations between the Bile Salt Export Pump and renal, thyroid, lipid metabolism, respiratory tract and central nervous system disorders. Unexpectedly, our model suggests PDE3 is associated with 40 ADRs. These associations provide a comprehensive resource to support drug development and human biology studies.

Download Full-text