Developing a machine learning model to identify protein–protein interaction hotspots to facilitate drug discovery

PeerJ ◽

10.7717/peerj.10381 ◽

2020 ◽

Vol 8 ◽

pp. e10381

Author(s):

Rohit Nandakumar ◽

Valentin Dinu

Keyword(s):

Machine Learning ◽

Amino Acid ◽

Drug Discovery ◽

Structural Information ◽

Learning Model ◽

Protein Protein Interaction ◽

Drug Molecules ◽

Machine Learning Model ◽

Disease Associations ◽

History Of

Throughout the history of drug discovery, an enzymatic-based approach for identifying new drug molecules has been primarily utilized. Recently, protein–protein interfaces that can be disrupted to identify small molecules that could be viable targets for certain diseases, such as cancer and the human immunodeficiency virus, have been identified. Existing studies computationally identify hotspots on these interfaces, with most models attaining accuracies of ~70%. Many studies do not effectively integrate information relating to amino acid chains and other structural information relating to the complex. Herein, (1) a machine learning model has been created and (2) its ability to integrate multiple features, such as those associated with amino-acid chains, has been evaluated to enhance the ability to predict protein–protein interface hotspots. Virtual drug screening analysis of a set of hotspots determined on the EphB2-ephrinB2 complex has also been performed. The predictive capabilities of this model offer an AUROC of 0.842, sensitivity/recall of 0.833, and specificity of 0.850. Virtual screening of a set of hotspots identified by the machine learning model developed in this study has identified potential medications to treat diseases caused by the overexpression of the EphB2-ephrinB2 complex, including prostate, gastric, colorectal and melanoma cancers which are linked to EphB2 mutations. The efficacy of this model has been demonstrated through its successful ability to predict drug-disease associations previously identified in literature, including cimetidine, idarubicin, pralatrexate for these conditions. In addition, nadolol, a beta blocker, has also been identified in this study to bind to the EphB2-ephrinB2 complex, and the possibility of this drug treating multiple cancers is still relatively unexplored.

Download Full-text

Peer Review #2 of "Developing a machine learning model to identify protein–protein interaction hotspots to facilitate drug discovery (v0.1)"

10.7287/peerj.10381v0.1/reviews/2 ◽

2020 ◽

Keyword(s):

Machine Learning ◽

Drug Discovery ◽

Peer Review ◽

Protein Interaction ◽

Learning Model ◽

Protein Protein Interaction ◽

Machine Learning Model

Download Full-text

Peer Review #1 of "Developing a machine learning model to identify protein–protein interaction hotspots to facilitate drug discovery (v0.1)"

10.7287/peerj.10381v0.1/reviews/1 ◽

2020 ◽

Keyword(s):

Machine Learning ◽

Drug Discovery ◽

Peer Review ◽

Protein Interaction ◽

Learning Model ◽

Protein Protein Interaction ◽

Machine Learning Model

Download Full-text

Peer Review #1 of "Developing a machine learning model to identify protein–protein interaction hotspots to facilitate drug discovery (v0.2)"

10.7287/peerj.10381v0.2/reviews/1 ◽

2020 ◽

Keyword(s):

Machine Learning ◽

Drug Discovery ◽

Peer Review ◽

Protein Interaction ◽

Learning Model ◽

Protein Protein Interaction ◽

Machine Learning Model

Download Full-text

A Machine Learning Model for Recommending Restaurants based on User Ratings

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.a1189.059120 ◽

2020 ◽

Vol 9 (1) ◽

pp. 732-736

Keyword(s):

Machine Learning ◽

Learning Model ◽

K Nearest Neighbors ◽

Processing Pipeline ◽

Chinese Restaurant ◽

Registered User ◽

Machine Learning Model ◽

History Of ◽

Multiclass Svm ◽

User Ratings

However, oftentimes people just search a restaurant by using word “restaurant”, while the word “restaurant” means differently to different individuals. For an Asian, it can mean a “Chinese restaurant” or “Thai restaurant”. How to correctly interpret search requests based on people’s preference is a challenge. Building a machine-learning model based on activity history of a registered user can solve this problem. The activity histories used by this research are reviews and ratings from users. This project introduces a data processing pipeline, which uses reviews from registered users to generate a machine-learning model for each registered user. This project also defines an architecture, which uses the generated machine-learning models to support real-time personalized recommendations for restaurant searching and type of foods good at those recommended restaurants. Finally, this project aims to develop a good machine learning model, different collaborative filtering methodologies are considered to predict restaurants using user ratings. Slope One, k-Nearest Neighbors algorithm and multiclass SVM classification are some of the collaborating methodologies are going to consider in this project.

Download Full-text

685 - Machine Learning Model to Predict Recurrent Ulcer Bleeding in Patients with History of Helicobacter Pylori ( H. Pylori) -Negative Idiopathic Gastroduodenal Ulcer Bleeding

Gastroenterology ◽

10.1016/s0016-5085(18)30881-3 ◽

2018 ◽

Vol 154 (6) ◽

pp. S-136

Author(s):

Grace L. Wong ◽

Andy J. Ma ◽

Louis H. Lau ◽

Jessica Y. Ching ◽

Francis K. Chan

Keyword(s):

Machine Learning ◽

Helicobacter Pylori ◽

Gastroduodenal Ulcer ◽

Learning Model ◽

Ulcer Bleeding ◽

Recurrent Ulcer ◽

Machine Learning Model ◽

History Of ◽

Gastroduodenal Ulcer Bleeding ◽

H Pylori

Download Full-text

A Review on the Methods of Peptide-MHC Binding Prediction

Current Bioinformatics ◽

10.2174/1574893615999200429122801 ◽

2021 ◽

Vol 15 (8) ◽

pp. 878-888

Author(s):

Yang Liu ◽

Xia-hui Ouyang ◽

Zhi-Xiong Xiao ◽

Le Zhang ◽

Yang Cao

Keyword(s):

Machine Learning ◽

T Cell ◽

Structural Information ◽

Three Dimensional ◽

Learning Model ◽

T Cell Epitopes ◽

Binding Prediction ◽

Future Directions ◽

Mhc Molecules ◽

Machine Learning Model

Background: T lymphocyte achieves an immune response by recognizing antigen peptides (also known as T cell epitopes) through major histocompatibility complex (MHC) molecules. The immunogenicity of T cell epitopes depends on their source and stability in combination with MHC molecules. The binding of the peptide to MHC is the most selective step, so predicting the binding affinity of the peptide to MHC is the principal step in predicting T cell epitopes. The identification of epitopes is of great significance in the research of vaccine design and T cell immune response. Objective: The traditional method for identifying epitopes is to synthesize and test the binding activity of peptide by experimental methods, which is not only time-consuming, but also expensive. In silico methods for predicting peptide-MHC binding emerge to pre-select candidate peptides for experimental testing, which greatly saves time and costs. By summarizing and analyzing these methods, we hope to have a better insight and provide guidance for future directions. Methods: Up to now, a number of methods have been developed to predict the binding ability of peptides to MHC based on various principles. Some of them employ matrix models or machine learning models based on the sequence characteristic embedded in peptides or MHC to predict the binding ability of peptides to MHC. Some others utilize the three-dimensional structural information of peptides or MHC, for example, by extracting three-dimensional structural information to construct a feature matrix or machine learning model, or directly using protein structure prediction, molecular docking to predict the binding mode of peptides and MHC. Results: Although the methods in predicting peptide-MHC binding based on the feature matrix or machine learning model can achieve high-throughput prediction, the accuracy of which depends heavily on the sequence characteristic of confirmed binding peptides. In addition, it cannot provide insights into the mechanism of antigen specificity. Therefore, such methods have certain limitations in practical applications. Methods in predicting peptide-MHC binding based on structural prediction or molecular docking are computationally intensive compared to the methods based on feature matrix or machine learning model and the challenge is how to predict a reliable structural model. Conclusion: This paper reviews the principles, advantages and disadvantages of the methods of peptide-MHC binding prediction and discussed the future directions to achieve more accurate predictions.

Download Full-text

Drug Discovery Maps, a Machine Learning Model That Visualizes and Predicts Kinome–Inhibitor Interaction Landscapes

Journal of Chemical Information and Modeling ◽

10.1021/acs.jcim.8b00640 ◽

2018 ◽

Vol 59 (3) ◽

pp. 1221-1229 ◽

Cited By ~ 13

Author(s):

Antonius P. A. Janssen ◽

Sebastian H. Grimm ◽

Ruud H. M. Wijdeven ◽

Eelke B. Lenselink ◽

Jacques Neefjes ◽

...

Keyword(s):

Machine Learning ◽

Drug Discovery ◽

Learning Model ◽

Machine Learning Model ◽

Inhibitor Interaction

Download Full-text

Pathogenicity Prediction of Single Amino Acid Variants with Machine Learning Model Based on Protein Structural Energies

IEEE/ACM Transactions on Computational Biology and Bioinformatics ◽

10.1109/tcbb.2021.3139048 ◽

2021 ◽

pp. 1-1

Author(s):

Tzu-Hsuan Wu ◽

Peng-Chan Lin ◽

Hsin-Hung Chou ◽

Meng-Ru Shen ◽

Sun-Yuan Hsieh

Keyword(s):

Machine Learning ◽

Amino Acid ◽

Learning Model ◽

Single Amino Acid ◽

Model Based ◽

Pathogenicity Prediction ◽

Machine Learning Model ◽

Amino Acid Variants

Download Full-text

Multitask machine learning models for predicting lipophilicity (logP) in the SAMPL7 challenge

Journal of Computer-Aided Molecular Design ◽

10.1007/s10822-021-00405-6 ◽

2021 ◽

Author(s):

Eelke B. Lenselink ◽

Pieter F. W. Stouten

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Drug Discovery ◽

Message Passing ◽

Learning Model ◽

Molecular Structures ◽

Learning Models ◽

Final Model ◽

Machine Learning Model ◽

Machine Learning Models

AbstractAccurate prediction of lipophilicity—logP—based on molecular structures is a well-established field. Predictions of logP are often used to drive forward drug discovery projects. Driven by the SAMPL7 challenge, in this manuscript we describe the steps that were taken to construct a novel machine learning model that can predict and generalize well. This model is based on the recently described Directed-Message Passing Neural Networks (D-MPNNs). Further enhancements included: both the inclusion of additional datasets from ChEMBL (RMSE improvement of 0.03), and the addition of helper tasks (RMSE improvement of 0.04). To the best of our knowledge, the concept of adding predictions from other models (Simulations Plus logP and [email protected], respectively) as helper tasks is novel and could be applied in a broader context. The final model that we constructed and used to participate in the challenge ranked 2/17 ranked submissions with an RMSE of 0.66, and an MAE of 0.48 (submission: Chemprop). On other datasets the model also works well, especially retrospectively applied to the SAMPL6 challenge where it would have ranked number one out of all submissions (RMSE of 0.35). Despite the fact that our model works well, we conclude with suggestions that are expected to improve the model even further.

Download Full-text

US Optum Database Study in Polycythemia Vera Patients: Thromboembolic Events (TEs) with Hydroxyurea (HU) Vs Ruxolitinib Switch Therapy and Machine-Learning Model to Predict Incidence of Tes and HU Failure

Blood ◽

10.1182/blood-2019-126410 ◽

2019 ◽

Vol 134 (Supplement_1) ◽

pp. 1659-1659

Author(s):

Srdan Verstovsek ◽

Valerio De Stefano ◽

Florian H. Heidel ◽

Mike Zuurman ◽

Michael Zaiac ◽

...

Keyword(s):

Machine Learning ◽

Platelet Count ◽

Median Duration ◽

Research Funding ◽

Learning Model ◽

Thromboembolic Events ◽

Machine Learning Model ◽

History Of ◽

Icd Codes ◽

Restrictive Definition

Introduction: Thromboembolic events (TEs) are one of the most prevalent complications in patients (pts) with polycythemia vera (PV). This real-world evidence study of the US OPTUM database evaluated the incidence of TEs in hydroxyurea (HU)-treated PV pts who either switched to ruxolitinib (RUX) after initial treatment (Tx) with HU (HU-RUX group) or continued HU Tx without switching (HU-alone group). Machine learning was then used to build a precise and scientifically robust model to predict the occurrence of TEs in PV pts with/without a history of TEs and HU failure (defined by either European LeukemiaNet [ELN] hematologic criteria or TEs). Methods: The OPTUM database comprises claims data and electronic medical records from 90 million pts (2007-2017, median stay in the database=7 years), including 69,464 PV pts. To avoid any selection bias during comparison, only pts treated prior to the RUX market launch were included in the HU-alone group (HU-RUX, n=81; HU-alone, n=195). Due to unavailability of Tx duration, time difference between the first and the last prescription was used as a proxy, and overall Tx duration was matched in both groups. TEs were assessed before Tx initiation in both groups. For HU-RUX pts, it was also assessed while on HU (median duration 27 months) and on RUX (median duration 14 months). For HU-alone pts, it was assessed during the first 27 months of Tx (any pt included in the analysis was treated for longer than this due to duration matching) and during remaining period of Tx (median duration 14 months). TEs were identified by either a restrictive definition (a list of ICD codes containing keywords from the RESPONSE study was automatically generated and manually curated) or a less restrictive one (list of ICD codes was manually expanded to include any TEs matching those from the GEMFIN study). PV pts who were exclusively treated with HU for ≥6 months were selected (n=2057) for modeling. Outcomes to be predicted were TEs in the 12 months following the end of the 6-month HU Tx period, and HU failure within 3 months of Tx. A logistic regression model was used for prediction. The baseline features extracted from the database included median lab parameters (3-6 months after HU initiation), history of thrombosis prior to primary diagnosis of PV, sociological features (age, gender), comorbidities, and concomitant medications (from inpatient/outpatient tables). Performance assessment methods included Receiver Operating Characteristic-Area Under the Curve (ROC-AUC) in early stages and confusion matrix in later stages; the findings were converted to clinically interpretable decision-tree classification algorithms. Results: Based on the extensive definition, the annual incidence of TEs in the HU-RUX and HU-alone groups, respectively, was 9% and 7% before HU initiation, which increased to 17% and 13% on HU Tx. The small difference in baseline incidence may reflect residual differences between the two groups. After a median duration of 14 months, the incidence of TEs decreased to 15% in pts who switched to RUX vs an increase to 20% in pts who continued HU Tx. A similar trend was observed using less restrictive definition (Figure 1). This definition resulted in a substantially increased incidence of TEs and a decreased predictive power of the machine-learning model. Using modeling, decision trees were developed to predict the occurrence of TEs in PV pts with/without a history of TEs. Lymphocyte percentage (<17%) and red cell distribution width (RDW; <15%) were predictors in pts without a history of TEs, whereas lymphocyte percentage (>13%) and platelet count (>393x103/µL) were predictors in pts with a history of TEs (Figure 2). Based on the decision tree developed to predict HU failure, phlebotomy-dependent pts with >15% RDW had a higher risk of HU failure within 3 months of Tx (Figure 3). Conclusions: A reduction in the incidence of TEs was observed in pts switching to RUX vs those who continued HU Tx. Based on the findings from this machine-learning model in PV pts, phlebotomy dependency and RDW were indicated as predictors of HU Tx failure within 3 months, whereas lymphocyte percentage+platelet count and lymphocyte percentage+RDW were predictors of incidence of TEs in pts with and without a history of TEs, respectively. Non-adjustment of the results for antiplatelet/anticoagulant Tx was a study limitation. Further validation of this machine-learning model is planned in other European databases. Disclosures Verstovsek: Celgene: Consultancy, Research Funding; Gilead: Research Funding; Promedior: Research Funding; CTI BioPharma Corp: Research Funding; Genetech: Research Funding; Protaganist Therapeutics: Research Funding; Constellation: Consultancy; Pragmatist: Consultancy; Incyte: Research Funding; Roche: Research Funding; NS Pharma: Research Funding; Blueprint Medicines Corp: Research Funding; Novartis: Consultancy, Research Funding; Sierra Oncology: Research Funding; Pharma Essentia: Research Funding; Astrazeneca: Research Funding; Ital Pharma: Research Funding. De Stefano:Celgene: Consultancy, Honoraria, Speakers Bureau; Janssen: Consultancy, Honoraria, Speakers Bureau; Amgen: Consultancy, Honoraria, Speakers Bureau; Novartis: Consultancy, Honoraria, Research Funding, Speakers Bureau; Alexion: Consultancy, Honoraria, Speakers Bureau. Heidel:Novartis: Consultancy, Honoraria, Research Funding; Celgene: Consultancy; CTI: Consultancy. Zuurman:Novartis Pharma B.V.: Employment. Zaiac:Novartis: Employment, Equity Ownership. Bigan:Novartis: Consultancy. Ruhl:Novartis: Consultancy. Meier:Novartis: Consultancy. Kiladjian:Celgene: Consultancy; Novartis: Honoraria, Research Funding; AOP Orphan: Honoraria, Research Funding.

Download Full-text