Protein pKa Prediction with Machine Learning

Protein pKa prediction with machine learning

10.33774/chemrxiv-2021-7gk5l ◽

2021 ◽

Author(s):

Zhitao Cai ◽

Fangfang Luo ◽

Yongxian Wang ◽

Enling Li ◽

Yandong Huang

Keyword(s):

Machine Learning ◽

Pka Prediction ◽

Protein Pka

Download Full-text

Protein pKa prediction by tree-based machine learning

10.26434/chemrxiv-2021-4d420 ◽

2021 ◽

Author(s):

Ada Y. Chen ◽

Juyong Lee ◽

Ana Damjanovic ◽

Bernard R. Brooks

Keyword(s):

Machine Learning ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

Pka Prediction ◽

Light Gradient ◽

Structure Database ◽

Gradient Boosting Machine ◽

Extreme Gradient Boosting ◽

Better Than ◽

Protein Pka

We present four tree-based machine learning models for protein pKa prediction. The four models, Random Forest, Extra Trees, eXtreme Gradient Boosting (XGBoost) and Light Gradient Boosting Machine (LightGBM), were trained on three experimental PDB and pKa datasets, two of which included a notable portion of internal residues. We observed similar performance among the four machine learning algorithms. The best model trained on the largest dataset performs 37% better than the widely used empirical pKa prediction tool PROPKA. The overall RMSE for this model is 0.69, with surface and buried RMSE values being 0.56 and 0.78, respectively, considering six residue types (Asp, Glu, His, Lys, Cys and Tyr), and 0.63 when considering Asp, Glu, His and Lys only. We provide pKa predictions for proteins in human proteome from the AlphaFold Protein Structure Database and observed that 1% of Asp/Glu/Lys residues have highly shifted pKa values close to the physiological pH.

Download Full-text

Protein pKa prediction with machine learning

10.33774/chemrxiv-2021-7gk5l-v2 ◽

2021 ◽

Author(s):

Zhitao Cai ◽

Fangfang Luo ◽

Yongxian Wang ◽

Enling Li ◽

Yandong Huang

Keyword(s):

Machine Learning ◽

Molecular Dynamics ◽

Prediction Accuracy ◽

Structure And Function ◽

Protein Electrostatics ◽

Pka Prediction ◽

Constant Ph ◽

And Function ◽

Constant Ph Molecular Dynamics ◽

Protein Pka

Protein pKa prediction is essential for the investigation of pH-associated relationship between protein structure and function. In this work, we introduce a deep learning based protein pKa predictor DeepKa, which is trained and validated with the pKa values derived from continuous constant pH molecular dynamics (CpHMD) simulations of 279 soluble proteins. Here the CpHMD implemented in the Amber molecular dynamics package has been employed (Huang, Harris, and Shen J. Chem. Inf. Model. 2018, 58, 1372-1383). Notably, to avoid discontinuities at the boundary, grid charges are proposed to represent protein electrostatics. We show that the prediction accuracy by DeepKa is close to that by CpHMD benchmarking simulations, validating DeepKa as an efficient protein pKa predictor. In addition, the training and validation sets created in this study can be applied to the development of machine learning based protein pKa predictors in future. Finally, the grid charge representation is general and applicable to other topics, such as the protein-ligand binding affinity prediction.

Download Full-text

Open-source QSAR models for pKa prediction using multiple machine learning approaches

Journal of Cheminformatics ◽

10.1186/s13321-019-0384-1 ◽

2019 ◽

Vol 11 (1) ◽

Cited By ~ 10

Author(s):

Kamel Mansouri ◽

Neal F. Cariello ◽

Alexandru Korotcov ◽

Valery Tkachenko ◽

Chris M. Grulke ◽

...

Keyword(s):

Machine Learning ◽

Open Source ◽

Acid Dissociation ◽

Support Vector ◽

Learning Approaches ◽

Data Set ◽

Pka Prediction ◽

Chemical Structures ◽

Extreme Gradient Boosting ◽

Qsar Models

Abstract Background The logarithmic acid dissociation constant pKa reflects the ionization of a chemical, which affects lipophilicity, solubility, protein binding, and ability to pass through the plasma membrane. Thus, pKa affects chemical absorption, distribution, metabolism, excretion, and toxicity properties. Multiple proprietary software packages exist for the prediction of pKa, but to the best of our knowledge no free and open-source programs exist for this purpose. Using a freely available data set and three machine learning approaches, we developed open-source models for pKa prediction. Methods The experimental strongest acidic and strongest basic pKa values in water for 7912 chemicals were obtained from DataWarrior, a freely available software package. Chemical structures were curated and standardized for quantitative structure–activity relationship (QSAR) modeling using KNIME, and a subset comprising 79% of the initial set was used for modeling. To evaluate different approaches to modeling, several datasets were constructed based on different processing of chemical structures with acidic and/or basic pKas. Continuous molecular descriptors, binary fingerprints, and fragment counts were generated using PaDEL, and pKa prediction models were created using three machine learning methods, (1) support vector machines (SVM) combined with k-nearest neighbors (kNN), (2) extreme gradient boosting (XGB) and (3) deep neural networks (DNN). Results The three methods delivered comparable performances on the training and test sets with a root-mean-squared error (RMSE) around 1.5 and a coefficient of determination (R2) around 0.80. Two commercial pKa predictors from ACD/Labs and ChemAxon were used to benchmark the three best models developed in this work, and performance of our models compared favorably to the commercial products. Conclusions This work provides multiple QSAR models to predict the strongest acidic and strongest basic pKas of chemicals, built using publicly available data, and provided as free and open-source software on GitHub.

Download Full-text

Mind wandering as data augmentation: How mental travel supports abstraction

Behavioral and Brain Sciences ◽

10.1017/s0140525x1900311x ◽

2020 ◽

Vol 43 ◽

Author(s):

Myrthe Faber

Keyword(s):

Machine Learning ◽

Data Augmentation ◽

Mental Content ◽

Mind Wandering ◽

Theoretical Framework ◽

Important Addition

Abstract Gilead et al. state that abstraction supports mental travel, and that mental travel critically relies on abstraction. I propose an important addition to this theoretical framework, namely that mental travel might also support abstraction. Specifically, I argue that spontaneous mental travel (mind wandering), much like data augmentation in machine learning, provides variability in mental content and context necessary for abstraction.

Download Full-text