scholarly journals Prediction of drug targets for specific diseases leveraging gene perturbation data: A machine learning approach

2021 ◽  
Author(s):  
Kai Zhao ◽  
Yujia Shi ◽  
Hon-Cheong SO

Identification of the correct targets is a key element for successful drug development. However, there are limited approaches for predicting drug targets for specific diseases using omics data, and few have leveraged expression profiles from gene perturbations. We present a novel computational target discovery approach based on machine learning (ML) models. ML models are first trained on drug-induced expression profiles, with outcomes defined as whether the drug treats the studied disease. The goal is to learn expression patterns associated with treatment. The fitted ML models were then applied to expression profiles from gene perturbations (over-expression[OE]/knockdown[KD]). We prioritized targets based on predicted probabilities from the ML model, which reflects treatment potential. The methodology was applied to predict targets for hypertension, diabetes mellitus (DM), rheumatoid arthritis (RA) and schizophrenia (SCZ). We validated our approach by evaluating whether the identified targets may re-discover known drug targets from an external database (OpenTargets). We indeed found evidence of significant enrichment across all diseases under study. Further literature search revealed that many candidates were supported by previous studies. For example, we predicted PSMB8 inhibition to be associated with treatment of RA, which was supported by a study showing PSMB8 inhibitors (PR-957) ameliorated experimental RA in mice. In conclusion, we propose a new ML approach to integrate expression profiles from drugs and gene perturbations and validated the framework. Our approach is flexible and may provide an independent source of information when prioritizing targets.

2019 ◽  
Author(s):  
Zoltan Dezso ◽  
Michele Ceccarelli

Abstract Background The selection and prioritization of drug targets is a central problem in drug discovery. Computational approaches can leverage the growing number of large-scale human genomics and proteomics data to make in-silico target identification, reducing the cost and the time needed. Results We developed a machine learning approach to score proteins to generate a druggability score of novel targets. In our model we incorporated 70 protein features which included properties derived from the sequence, features characterizing protein functions as well as network properties derived from the protein-protein interaction network. The advantage of this approach is that it is unbiased and even less studied proteins with limited information about their function can score well as most of the features are independent of the accumulated literature. We build models on a training set which consist of targets with approved drugs and a negative set of non-drug targets. The machine learning techniques help to identify the most important combination of features differentiating validated targets from non-targets. We validated our predictions on an independent set of clinical trial drug targets, achieving a high accuracy characterized by an AUC of 0.89. Our most predictive features included biological function of proteins, network centrality measures, protein essentiality, tissue specificity, localization and solvent accessibility. Our predictions, based on a small set of 102 validated oncology targets, recovered the majority of known drug targets and identifies a novel set of proteins as drug target candidates. Conclusions We developed a machine learning approach to prioritize proteins according to their similarity to approved drug targets. We have shown that the method proposed is highly predictive on a validation dataset consisting of 277 targets of clinical trial drug confirming that our computational approach is an efficient and cost-effective tool for drug target discovery and prioritization. Our predictions were based on oncology targets and cancer relevant biological functions, resulting in significantly higher scores for targets of oncology clinical trial drugs compared to the scores of targets of trial drugs for other indications. Our approach can be used to make indication specific drug-target prediction by combining generic druggability features with indication specific biological functions.


Author(s):  
Xin Liang ◽  
Wen Zhu ◽  
Bo Liao ◽  
Bo Wang ◽  
Jialiang Yang ◽  
...  

Some carcinomas show that one or more metastatic sites appear with unknown origins. The identification of primary or metastatic tumor tissues is crucial for physicians to develop precise treatment plans for patients. With unknown primary origin sites, it is challenging to design specific plans for patients. Usually, those patients receive broad-spectrum chemotherapy, while still having poor prognosis though. Machine learning has been widely used and already achieved significant advantages in clinical practices. In this study, we classify and predict a large number of tumor samples with uncertain origins by applying the random forest and Naive Bayesian algorithms. We use the precision, recall, and other measurements to evaluate the performance of our approach. The results have showed that the prediction accuracy of this method was 90.4 for 7,713 samples. The accuracy was 80% for 20 metastatic tumors samples. In addition, the 10-fold cross-validation is used to evaluate the accuracy of classification, which reaches 91%.


2020 ◽  
Vol 38 (15_suppl) ◽  
pp. 1559-1559
Author(s):  
Jason Chia-Hsun Hsieh ◽  
Chun-Ta Liao ◽  
Hung-Ming Wang ◽  
Ming-Yu Lien ◽  
Yung-Chang Lin ◽  
...  

1559 Background: Earlier cancer diagnosis leads to higher patient survival rate and reduces financial burdens for patients and their families. Over the past five years, liquid biopsy has demonstrated tremendous promise in the early detection of tumor presence. In addition to circulating tumor cells and circulating tumor DNAs, extracellular microRNAs (miRNAs) have also been shown to be promising diagnostic biomarkers. Through machine-learning profiling, we sought to determine whether or not we could use individuals’ miRNA expression to distinguish between healthy subjects and cancer patients. Methods: Blood samples were collected from healthy donors and from patients of various cancer types. Plasma samples were purified within two hours of sample collection, followed by miRNA extraction. After performing reverse transcription of miRNAs into cDNAs, expression analysis of miRNAs was done using a novel multi-gene, amplification-based detection system that simultaneously analyzes over 160 miRNAs. For subsequent data processing, miRNAs without amplification signals across all profiles were first removed, resulting in 135 miRNAs. These 135 resulting miRNAs were then used as features in Support Vector Machine (SVM) to build OncoSweep classifier, a proprietary prediction algorithm for classification of the samples. Ten-fold cross validation was used to evaluate the performance of OncoSweep. Results: 344 healthy donor samples and 417 cancer patient samples were collected for the study. The prediction algorithm, OncoSweep, was derived based on the miRNA expression patterns of the healthy and the patient samples. The algorithm scored an overall accuracy for cancer prediction of 86.47%, with a sensitivity of 91.4%, a specificity of 85%, a PPV of 85% and an NPV of 88.5%. Conclusions: Utilizing machine-learning method of analyzing circulating miRNA expression profiles, the derived algorithm OncoSweep shows significant promise in cancer prediction. Validation is currently being performed in a larger study. We believe circulating miRNAs, through stringent sample processing and machine-learning methodology, are powerful biomarkers for earlier cancer detection.


2020 ◽  
Vol 6 (39) ◽  
pp. eaba9338 ◽  
Author(s):  
George W. Ashdown ◽  
Michelle Dimon ◽  
Minjie Fan ◽  
Fernando Sánchez-Román Terán ◽  
Kathrin Witmer ◽  
...  

Drug resistance threatens the effective prevention and treatment of an ever-increasing range of human infections. This highlights an urgent need for new and improved drugs with novel mechanisms of action to avoid cross-resistance. Current cell-based drug screens are, however, restricted to binary live/dead readouts with no provision for mechanism of action prediction. Machine learning methods are increasingly being used to improve information extraction from imaging data. These methods, however, work poorly with heterogeneous cellular phenotypes and generally require time-consuming human-led training. We have developed a semi-supervised machine learning approach, combining human- and machine-labeled training data from mixed human malaria parasite cultures. Designed for high-throughput and high-resolution screening, our semi-supervised approach is robust to natural parasite morphological heterogeneity and correctly orders parasite developmental stages. Our approach also reproducibly detects and clusters drug-induced morphological outliers by mechanism of action, demonstrating the potential power of machine learning for accelerating cell-based drug discovery.


2018 ◽  
Vol 1 (6) ◽  
pp. e201800098 ◽  
Author(s):  
Artem Lysenko ◽  
Alok Sharma ◽  
Keith A Boroevich ◽  
Tatsuhiko Tsunoda

Recent trends in drug development have been marked by diminishing returns caused by the escalating costs and falling rates of new drug approval. Unacceptable drug toxicity is a substantial cause of drug failure during clinical trials and the leading cause of drug withdraws after release to the market. Computational methods capable of predicting these failures can reduce the waste of resources and time devoted to the investigation of compounds that ultimately fail. We propose an original machine learning method that leverages identity of drug targets and off-targets, functional impact score computed from Gene Ontology annotations, and biological network data to predict drug toxicity. We demonstrate that our method (TargeTox) can distinguish potentially idiosyncratically toxic drugs from safe drugs and is also suitable for speculative evaluation of different target sets to support the design of optimal low-toxicity combinations.


2019 ◽  
Author(s):  
Zoltan Dezso ◽  
Michele Ceccarelli

Abstract Background The selection and prioritization of drug targets is a central problem in drug discovery. Computational approaches can leverage the growing number of large-scale human genomics and proteomics data to make in-silico target identification, reducing the cost and the time needed. Results We developed a machine learning approach to score proteins to generate a druggability score of novel targets. In our model we incorporated 70 protein features which included properties derived from the sequence, features characterizing protein functions as well as network properties derived from the protein-protein interaction network. The advantage of this approach is that it is unbiased and even less studied proteins with limited information about their function can score well as most of the features are independent of the accumulated literature. We build models on a training set which consist of targets with approved drugs and a negative set of non-drug targets. The machine learning techniques help to identify the most important combination of features differentiating validated targets from non-targets. We validated our predictions on an independent set of clinical trial drug targets, achieving a high accuracy characterized by an AUC of 0.89. Our most predictive features included biological function of proteins, network centrality measures, protein essentiality, tissue specificity, localization and solvent accessibility. Our predictions, based on a small set of 102 validated oncology targets, recovered the majority of known drug targets and identifies a novel set of proteins as drug target candidates. Conclusions We developed a machine learning approach to prioritize proteins according to their similarity to approved drug targets. We have shown that the method proposed is highly predictive on a validation dataset consisting of 277 targets of clinical trial drug confirming that our computational approach is an efficient and cost-effective tool for drug target discovery and prioritization. Our predictions were based on oncology targets and cancer relevant biological functions, resulting in significantly higher scores for targets of oncology clinical trial drugs compared to the scores of targets of trial drugs for other indications. Our approach can be used to make indication specific drug-target prediction by combining generic druggability features with indication specific biological functions.


2019 ◽  
Author(s):  
Jenny Smith ◽  
Sean K. Maden ◽  
David Lee ◽  
Ronald Buie ◽  
Vikas Peddu ◽  
...  

AbstractAcute myeloid leukemia (AML) is a cancer of hematopoietic systems that poses high population burden, especially among pediatric populations. AML presents with high molecular heterogeneity, complicating patient risk stratification and treatment planning. While molecular and cytogenetic subtypes of AML are well described, significance of subtype-specific gene expression patterns is poorly understood and effective modeling of these patterns with individual algorithms is challenging. Using a novel consensus machine learning approach, we analyzed public RNA-seq and clinical data from pediatric AML patients (N = 137 patients) enrolled in the TARGET project.We used a binary risk classifier (Low vs. Not-Low Risk) to study risk-specific expression patterns in pediatric AML. We applied the following workflow to identify important gene targets from RNA-seq data: (1) Reduce data dimensionality by identification of differentially expressed genes for AML risk (N = 1984 loci); (2) Optimize algorithm hyperparameters for each of 4 algorithm types (lasso, XGBoost, random forest, and SVM); (3) Study ablation test results for penalized methods (lasso and XGBoost); (4) Bootstrap Boruta permutations with a novel consensus importance metric.We observed recurrently selected features across hyperparameter optimizations, ablation tests, and Boruta permutation bootstrap iterations, including HOXA9 and putative cofactors including MEIS1. Consensus feature selection from Boruta bootstraps identified a larger gene set than single penalized algorithm runs (lasso or XGBoost), while also including correlated and predictive genes from ablation tests.We present a consensus machine learning approach to identify gene targets of likely importance for pediatric AML risk. The approach identified a moderately sized set of recurrent important genes from across 4 algorithm types, including genes identified across ablation tests with penalized algorithms (HOXA9 and MEIS1). Our approach mitigates exclusion biases of penalized algorithms (lasso and XGBoost) while obviating arbitrary importance cutoffs for other types (SVM and random forest). This approach is readily generalizable for research of other heterogeneous diseases, single-assay experiments, and high-dimensional data. Resources and code to recreate our findings are available online.


2021 ◽  
Author(s):  
Mireia Jimenez-Roses ◽  
Bradley A Morgan ◽  
Maria Jimenez Sigstad ◽  
T.D. Zoe Tran ◽  
Rohini Srivastava ◽  
...  

G protein coupled receptors (GPCRs) form one of the largest families of proteins in humans, and are valuable therapeutic targets for a variety of different diseases. One central question of drug discovery surrounding GPCRs is what determines the agonism or antagonism exhibited by ligands which bind these important targets. Ligands exert their action via the interactions they make in the ligand binding pocket. We hypothesised that there is a common set of receptor interactions made by ligands of diverse structures that mediate their action. We reasoned that among a large dataset of different ligands, the functionally important interactions will be over-represented. To investigate this hypothesis, we assembled a database of ~2700 known β2AR ligands and computationally docked them to multiple experimentally determined β2AR structures, generating ca 75,000 docking poses. For each docking pose, we predicted all interactions between the atoms of the receptor and the atoms of the ligand. Using Machine Learning (ML) we identified specific interactions that correlated with the agonist or antagonist activity of these ligands, and developed ML-based predictors of agonist/antagonist activity with up to 90% accuracy. This approach can be readily applied to other GPCRs and drug targets beyond GPCRs.


2019 ◽  
Author(s):  
Robert Ietswaart ◽  
Seda Arat ◽  
Amanda X. Chen ◽  
Saman Farahmand ◽  
Bumjun Kim ◽  
...  

AbstractAdverse drug reactions (ADRs) are one of the leading causes of morbidity and mortality in health care. Understanding which drug targets are linked to ADRs can lead to the development of safer medicines. Here, we analyze in vitro secondary pharmacology of common (off) targets for 2134 marketed drugs. To associate these drugs with human ADRs, we utilized FDA Adverse Event Reports and developed random forest models that predict ADR occurrences from in vitro pharmacological profiles. By evaluating Gini importance scores of model features, we identify 221 target-ADR associations, which co-occur in PubMed abstracts to a greater extent than expected by chance. Among these are established relations, such as the association of in vitro hERG binding with cardiac arrhythmias, which further validate our machine learning approach. Evidence on bile acid metabolism supports our identification of associations between the Bile Salt Export Pump and renal, thyroid, lipid metabolism, respiratory tract and central nervous system disorders. Unexpectedly, our model suggests PDE3 is associated with 40 ADRs. These associations provide a comprehensive resource to support drug development and human biology studies.


Sign in / Sign up

Export Citation Format

Share Document