DeepBindRG: a deep learning based method for estimating effective protein–ligand affinity

Learning Protein-Ligand Binding Affinity with Atomic Environment Vectors

10.26434/chemrxiv.13469625 ◽

2020 ◽

Author(s):

Rocco Meli ◽

Andrew Anighoro ◽

Mike Bodkin ◽

Garrett Morris ◽

Philip Biggin

Keyword(s):

Ligand Binding ◽

Binding Affinity ◽

Correlation Coefficient ◽

Scoring Function ◽

Scoring Functions ◽

Autodock Vina ◽

Pearson’S Correlation ◽

Pearson’S Correlation Coefficient ◽

Pearson's Correlation ◽

Pearson's Correlation Coefficient

<div> <div> <div> <p>Scoring functions for the prediction of protein-ligand binding affinity have seen renewed interest in recent years when novel machine learning and deep learning methods started to consistently outperform classical scoring functions. Here we explore the use of atomic environment vectors (AEVs) and feed-forward neural networks, the building blocks of several neural network potentials, for the prediction of protein-ligand binding affinity. The AEV-based scoring function, which we term AEScore, is shown to perform as well or better than other state-of-the-art scoring functions on binding affinity prediction, with an RMSE of 1.22 pK units and a Pearson’s correlation coefficient of 0.83 for the CASF-2016 benchmark. However, AEScore does not perform as well in docking and virtual screening tasks. We therefore show that the model can be combined with the classical scoring function AutoDock Vina in the context of ∆-learning, where corrections to the AutoDock Vina scoring function are learned instead of the protein-ligand binding affinity itself. Combined with AutoDock Vina, ∆-AEScore has an RMSE of 1.32 pK units and a Pearson’s correlation coefficient of 0.80 on the CASF-2016 benchmark, while retaining the good docking and screening power of the underlying classical scoring function. </p> </div> </div> </div>

Download Full-text

Learning Protein-Ligand Binding Affinity with Atomic Environment Vectors

10.26434/chemrxiv.13469625.v1 ◽

2020 ◽

Author(s):

Rocco Meli ◽

Andrew Anighoro ◽

Mike Bodkin ◽

Garrett Morris ◽

Philip Biggin

Keyword(s):

Ligand Binding ◽

Binding Affinity ◽

Correlation Coefficient ◽

Scoring Function ◽

Scoring Functions ◽

Autodock Vina ◽

Pearson’S Correlation ◽

Pearson’S Correlation Coefficient ◽

Pearson's Correlation ◽

Pearson's Correlation Coefficient

<div> <div> <div> <p>Scoring functions for the prediction of protein-ligand binding affinity have seen renewed interest in recent years when novel machine learning and deep learning methods started to consistently outperform classical scoring functions. Here we explore the use of atomic environment vectors (AEVs) and feed-forward neural networks, the building blocks of several neural network potentials, for the prediction of protein-ligand binding affinity. The AEV-based scoring function, which we term AEScore, is shown to perform as well or better than other state-of-the-art scoring functions on binding affinity prediction, with an RMSE of 1.22 pK units and a Pearson’s correlation coefficient of 0.83 for the CASF-2016 benchmark. However, AEScore does not perform as well in docking and virtual screening tasks. We therefore show that the model can be combined with the classical scoring function AutoDock Vina in the context of ∆-learning, where corrections to the AutoDock Vina scoring function are learned instead of the protein-ligand binding affinity itself. Combined with AutoDock Vina, ∆-AEScore has an RMSE of 1.32 pK units and a Pearson’s correlation coefficient of 0.80 on the CASF-2016 benchmark, while retaining the good docking and screening power of the underlying classical scoring function. </p> </div> </div> </div>

Download Full-text

Learning from Docked Ligands: Ligand-Based Features Rescue Structure-Based Scoring Functions When Trained On Docked Poses

10.26434/chemrxiv.13637756 ◽

2021 ◽

Author(s):

Fergus Boyles ◽

Charlotte M Deane ◽

Garrett Morris

Keyword(s):

Machine Learning ◽

Ligand Binding ◽

Crystal Structures ◽

Binding Affinity ◽

Scoring Function ◽

Scoring Functions ◽

Data Set ◽

Core Sets ◽

Strong Performance

Machine learning scoring functions for protein-ligand binding affinity have been found to consistently outperform classical scoring functions when trained and tested on crystal structures of bound protein-ligand complexes. However, it is less clear how these methods perform when applied to docked poses of complexes.<br><br>We explore how the use of docked, rather than crystallographic, poses for both training and testing affects the performance of machine learning scoring functions. Using the PDBbind Core Sets as benchmarks, we show that the performance of a structure-based machine learning scoring function trained and tested on docked poses is lower than that of the same scoring function trained and tested on crystallographic poses. We construct a hybrid scoring function by combining both structure-based and ligand-based features, and show that its ability to predict binding affinity using docked poses is comparable to that of purely structure-based scoring functions trained and tested on crystal poses. Despite strong performance on docked poses of the PDBbind Core Sets, we find that our hybrid scoring function fails to generalise to anew data set, demonstrating the need for improved scoring functions and additional validation benchmarks. <br><br>Code and data to reproduce our results are available from https://github.com/oxpig/learning-from-docked-poses.

Download Full-text

CSCORE: A SIMPLE YET EFFECTIVE SCORING FUNCTION FOR PROTEIN–LIGAND BINDING AFFINITY PREDICTION USING MODIFIED CMAC LEARNING ARCHITECTURE

Journal of Bioinformatics and Computational Biology ◽

10.1142/s021972001100577x ◽

2011 ◽

Vol 09 (supp01) ◽

pp. 1-14 ◽

Cited By ~ 20

Author(s):

XUCHANG OUYANG ◽

STEPHANUS DANIEL HANDOKO ◽

CHEE KEONG KWOH

Keyword(s):

Binding Affinity ◽

Scoring Function ◽

Binding Mode ◽

Computational Method ◽

Data Driven ◽

Machine Learning Techniques ◽

Ligand Docking ◽

Scoring Functions ◽

Binding Affinity Prediction ◽

Affinity Prediction

Protein–ligand docking is a computational method to identify the binding mode of a ligand and a target protein, and predict the corresponding binding affinity using a scoring function. This method has great value in drug design. After decades of development, scoring functions nowadays typically can identify the true binding mode, but the prediction of binding affinity still remains a major problem. Here we present CScore, a data-driven scoring function using a modified Cerebellar Model Articulation Controller (CMAC) learning architecture, for accurate binding affinity prediction. The performance of CScore in terms of correlation between predicted and experimental binding affinities is benchmarked under different validation approaches. CScore achieves a prediction with R = 0.7668 and RMSE = 1.4540 when tested on an independent dataset. To the best of our knowledge, this result outperforms other scoring functions tested on the same dataset. The performance of CScore varies on different clusters under the leave-cluster-out validation approach, but still achieves competitive result. Lastly, the target-specified CScore achieves an even better result with R = 0.8237 and RMSE = 1.0872, trained on a much smaller but more relevant dataset for each target. The large dataset of protein–ligand complexes structural information and advances of machine learning techniques enable the data-driven approach in binding affinity prediction. CScore is capable of accurate binding affinity prediction. It is also shown that CScore will perform better if sufficient and relevant data is presented. As there is growth of publicly available structural data, further improvement of this scoring scheme can be expected.

Download Full-text

Learning from Docked Ligands: Ligand-Based Features Rescue Structure-Based Scoring Functions When Trained On Docked Poses

10.26434/chemrxiv.13637756.v1 ◽

2021 ◽

Author(s):

Fergus Boyles ◽

Charlotte M Deane ◽

Garrett Morris

Keyword(s):

Machine Learning ◽

Ligand Binding ◽

Crystal Structures ◽

Binding Affinity ◽

Scoring Function ◽

Scoring Functions ◽

Data Set ◽

Core Sets ◽

Strong Performance

Machine learning scoring functions for protein-ligand binding affinity have been found to consistently outperform classical scoring functions when trained and tested on crystal structures of bound protein-ligand complexes. However, it is less clear how these methods perform when applied to docked poses of complexes.<br><br>We explore how the use of docked, rather than crystallographic, poses for both training and testing affects the performance of machine learning scoring functions. Using the PDBbind Core Sets as benchmarks, we show that the performance of a structure-based machine learning scoring function trained and tested on docked poses is lower than that of the same scoring function trained and tested on crystallographic poses. We construct a hybrid scoring function by combining both structure-based and ligand-based features, and show that its ability to predict binding affinity using docked poses is comparable to that of purely structure-based scoring functions trained and tested on crystal poses. Despite strong performance on docked poses of the PDBbind Core Sets, we find that our hybrid scoring function fails to generalise to anew data set, demonstrating the need for improved scoring functions and additional validation benchmarks. <br><br>Code and data to reproduce our results are available from https://github.com/oxpig/learning-from-docked-poses.

Download Full-text

Binding affinity prediction for protein–ligand complex using deep attention mechanism based on intermolecular interactions

BMC Bioinformatics ◽

10.1186/s12859-021-04466-0 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Sangmin Seo ◽

Jonghwan Choi ◽

Sanghyun Park ◽

Jaegyoon Ahn

Keyword(s):

Deep Learning ◽

Binding Affinity ◽

Prediction Models ◽

Attention Mechanism ◽

Scoring Functions ◽

Ligand Complex ◽

Structure Based Drug Design ◽

Binding Affinity Prediction ◽

Affinity Prediction ◽

Proposed Model

Abstract Background Accurate prediction of protein–ligand binding affinity is important for lowering the overall cost of drug discovery in structure-based drug design. For accurate predictions, many classical scoring functions and machine learning-based methods have been developed. However, these techniques tend to have limitations, mainly resulting from a lack of sufficient energy terms to describe the complex interactions between proteins and ligands. Recent deep-learning techniques can potentially solve this problem. However, the search for more efficient and appropriate deep-learning architectures and methods to represent protein–ligand complex is ongoing. Results In this study, we proposed a deep-neural network model to improve the prediction accuracy of protein–ligand complex binding affinity. The proposed model has two important features, descriptor embeddings with information on the local structures of a protein–ligand complex and an attention mechanism to highlight important descriptors for binding affinity prediction. The proposed model performed better than existing binding affinity prediction models on most benchmark datasets. Conclusions We confirmed that an attention mechanism can capture the binding sites in a protein–ligand complex to improve prediction performance. Our code is available at https://github.com/Blue1993/BAPA.

Download Full-text

Binding affinity prediction for protein-ligand complex using deep attention mechanism based on intermolecular interactions

10.1101/2021.03.18.436020 ◽

2021 ◽

Author(s):

Sangmin Seo ◽

Jonghwan Choi ◽

Sanghyun Park ◽

Jaegyoon Ahn

Keyword(s):

Deep Learning ◽

Binding Affinity ◽

Attention Mechanism ◽

Accurate Prediction ◽

Scoring Functions ◽

Ligand Complex ◽

Binding Affinity Prediction ◽

Affinity Prediction ◽

Local Structures ◽

Proposed Model

AbstractAccurate prediction of protein-ligand binding affinity is important in that it can lower the overall cost of drug discovery in structure-based drug design. For more accurate prediction, many classical scoring functions and machine learning-based methods have been developed. However, these techniques tend to have limitations, mainly resulting from a lack of sufficient interactions energy terms to describe complex interactions between proteins and ligands. Recent deep-learning techniques show strong potential to solve this problem, but the search for more efficient and appropriate deep-learning architectures and methods to represent protein-ligand complexes continues. In this study, we proposed a deep-neural network for more accurate prediction of protein-ligand complex binding affinity. The proposed model has two important features, descriptor embeddings that contains embedded information about the local structures of a protein-ligand complex and an attention mechanism for highlighting important descriptors to binding affinity prediction. The proposed model showed better performance on most benchmark datasets than existing binding affinity prediction models. Moreover, we confirmed that an attention mechanism was able to capture binding sites in a protein-ligand complex and that it contributed to improvement in predictive performance. Our code is available at https://github.com/Blue1993/BAPA.Author summaryThe initial step in drug discovery is to identify drug candidates for a target protein using a scoring function. Existing scoring functions, however, lack the ability to accurately predict the binding affinity of protein-ligand complexes. In this study, we proposed a deep learning-based approach to extract patterns from the local structures of protein-ligand complexes and to highlight the important local structures via an attention mechanism. The proposed model showed good performance for various benchmark datasets compared to existing models.

Download Full-text

DLSCORE: A Deep Learning Model for Predicting Protein-Ligand Binding Affinities

10.26434/chemrxiv.6159143 ◽

2018 ◽

Cited By ~ 1

Author(s):

Mahmudulla Hassan ◽

Daniel Castaneda Mogollon ◽

Olac Fuentes ◽

suman sirimulla

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Ligand Binding ◽

Scoring Function ◽

Learning Approaches ◽

Scoring Functions ◽

Binding Affinities ◽

Discovery Research ◽

Drug Discovery Research ◽

Fully Connected

<p>In recent years, the cheminformatics community has seen an increased success with machine learning-based scoring functions for estimating binding affinities and pose predictions. The prediction of protein-ligand binding affinities is crucial for drug discovery research. Many physics-based scoring functions have been developed over the years. Lately, machine learning approaches are proven to boost the performance of traditional scoring functions. In this study, a novel deep learning based scoring function (DLSCORE) was developed and trained on the refined PDBBind v.2016 dataset using 348 BINding ANAlyzer (BINANA) descriptors. The neural networks of the DLSCORE model have different number of fully connected hidden layers. Our model, an ensemble of 10 networks, yielded a Pearson R2 of 0.82, a Spearman Rho R2 of 0.90, Kendall Tau R2 of 0.74, an RMSE of 1.15 kcal=mol, and an MAE of 0.86 kcal=mol for our test set. This software is available on Github at https://github.com/sirimullalab/dlscore.git</p><p><br></p>

Download Full-text

Learning from the ligand: using ligand-based features to improve binding affinity prediction

Bioinformatics ◽

10.1093/bioinformatics/btz665 ◽

2019 ◽

Cited By ~ 7

Author(s):

Fergus Boyles ◽

Charlotte M Deane ◽

Garrett M Morris

Keyword(s):

Machine Learning ◽

Binding Affinity ◽

Pearson Correlation ◽

Scoring Function ◽

Supplementary Information ◽

Scoring Functions ◽

Limited Information ◽

Ligand Complex ◽

Binding Affinity Prediction ◽

Affinity Prediction

Abstract Motivation Machine learning scoring functions for protein–ligand binding affinity prediction have been found to consistently outperform classical scoring functions. Structure-based scoring functions for universal affinity prediction typically use features describing interactions derived from the protein–ligand complex, with limited information about the chemical or topological properties of the ligand itself. Results We demonstrate that the performance of machine learning scoring functions are consistently improved by the inclusion of diverse ligand-based features. For example, a Random Forest (RF) combining the features of RF-Score v3 with RDKit molecular descriptors achieved Pearson correlation coefficients of up to 0.836, 0.780 and 0.821 on the PDBbind 2007, 2013 and 2016 core sets, respectively, compared to 0.790, 0.746 and 0.814 when using the features of RF-Score v3 alone. Excluding proteins and/or ligands that are similar to those in the test sets from the training set has a significant effect on scoring function performance, but does not remove the predictive power of ligand-based features. Furthermore a RF using only ligand-based features is predictive at a level similar to classical scoring functions and it appears to be predicting the mean binding affinity of a ligand for its protein targets. Availability and implementation Data and code to reproduce all the results are freely available at http://opig.stats.ox.ac.uk/resources. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

SMPLIP-Score: Predicting the Ligand Binding Affinity from Simple and Interpretable On-The-Fly Interaction Fingerprint Pattern Descriptors

10.21203/rs.3.rs-74202/v1 ◽

2020 ◽

Author(s):

Surendra Kumar ◽

Mi-hyun Kim

Keyword(s):

Ligand Binding ◽

Binding Affinity ◽

Scoring Functions ◽

Ligand Complex ◽

Ligand Interaction ◽

Fingerprint Pattern ◽

Comparable Performance ◽

Direct Interpretation ◽

Benchmark Datasets ◽

Lower Complexity

Abstract In drug discovery, rapid and accurate prediction of protein-ligand binding affinities is a pivotal task for lead optimization with acceptable on-target potency as well as pharmacological efficacy. Furthermore, researchers hope high correlation between a docking score and a pose with key interactive residues, though scoring functions as a free energy surrogate of a protein-ligand complex have failed to provide the collinearity. Recently, various machine learning or deep learning methods have been proposed to overcome the drawback of scoring functions. Despite their high accuracy, their featurization process is complex and requires high cost for its interpretation (less compatible for human recognition). Here, we propose SMPLIP-Score (Substructural Molecular and Protein-Ligand Interaction Pattern Score), a simple interpretable predictor of the absolute binding affinity. Our simple featurization embedded the interaction fingerprint pattern on the ligand-binding site environment and molecular fragments of ligands into an input vectorized matrix for learning layers (random forest or deep neural network). Despite lower complexity than state-of-the-art models, SMPLIP-Score achieved comparable performance, a Pearson’s correlation coefficient up to 0.80 and a RMSE up to 1.18 in pK units on several benchmark datasets (PDBbind v.2015, Astex Diverse Set, CSAR NRC HiQ, FEP, PDBbind NMR, and CASF-2016). For this model, generality, predictive power, ranking power, and robustness also were examined with direct interpretation of feature matrices for specific targets.

Download Full-text