A Machine Learning-Based Prediction Platform for P-Glycoprotein Modulators and Its Validation by Molecular Docking

Onat Kadioglu; Thomas Efferth

doi:10.3390/cells8101286

A Machine Learning-Based Prediction Platform for P-Glycoprotein Modulators and Its Validation by Molecular Docking

Cells ◽

10.3390/cells8101286 ◽

2019 ◽

Vol 8 (10) ◽

pp. 1286 ◽

Cited By ~ 1

Author(s):

Onat Kadioglu ◽

Thomas Efferth

Keyword(s):

Machine Learning ◽

Molecular Docking ◽

Learning Strategies ◽

High Performance ◽

External Validation ◽

Major Drawback ◽

Chemotherapy Drugs ◽

P Glycoprotein ◽

Validation Set ◽

Leave One Out

P-glycoprotein (P-gp) is an important determinant of multidrug resistance (MDR) because its overexpression is associated with increased efflux of various established chemotherapy drugs in many clinically resistant and refractory tumors. This leads to insufficient therapeutic targeting of tumor populations, representing a major drawback of cancer chemotherapy. Therefore, P-gp is a target for pharmacological inhibitors to overcome MDR. In the present study, we utilized machine learning strategies to establish a model for P-gp modulators to predict whether a given compound would behave as substrate or inhibitor of P-gp. Random forest feature selection algorithm-based leave-one-out random sampling was used. Testing the model with an external validation set revealed high performance scores. A P-gp modulator list of compounds from the ChEMBL database was used to test the performance, and predictions from both substrate and inhibitor classes were selected for the last step of validation with molecular docking. Predicted substrates revealed similar docking poses than that of doxorubicin, and predicted inhibitors revealed similar docking poses than that of the known P-gp inhibitor elacridar, implying the validity of the predictions. We conclude that the machine-learning approach introduced in this investigation may serve as a tool for the rapid detection of P-gp substrates and inhibitors in large chemical libraries.

Download Full-text

Machine Learning Classification of Head Impact Sensor Data

Volume 3: Biomedical and Biotechnology Engineering ◽

10.1115/imece2019-12173 ◽

2019 ◽

Author(s):

Tyler F. Rooks ◽

Andrea S. Dargie ◽

Valeta Carol Chancey

Keyword(s):

Machine Learning ◽

Decision Tree ◽

External Validation ◽

Classification Algorithm ◽

Sensor Data ◽

Environmental Sensors ◽

Head Acceleration ◽

Machine Learning Classification ◽

Environmental Sensor ◽

Validation Set

Abstract A shortcoming of using environmental sensors for the surveillance of potentially concussive events is substantial uncertainty regarding whether the event was caused by head acceleration (“head impacts”) or sensor motion (with no head acceleration). The goal of the present study is to develop a machine learning model to classify environmental sensor data obtained in the field and evaluate the performance of the model against the performance of the proprietary classification algorithm used by the environmental sensor. Data were collected from Soldiers attending sparring sessions conducted under a U.S. Army Combatives School course. Data from one sparring session were used to train a decision tree classification algorithm to identify good and bad signals. Data from the remaining sparring sessions were kept as an external validation set. The performance of the proprietary algorithm used by the sensor was also compared to the trained algorithm performance. The trained decision tree was able to correctly classify 95% of events for internal cross-validation and 88% of events for the external validation set. Comparatively, the proprietary algorithm was only able to correctly classify 61% of the events. In general, the trained algorithm was better able to predict when a signal was good or bad compared to the proprietary algorithm. The present study shows it is possible to train a decision tree algorithm using environmental sensor data collected in the field.

Download Full-text

Multiclass Classifier for P-Glycoprotein Substrates, Inhibitors, and Non-Active Compounds

Molecules ◽

10.3390/molecules24102006 ◽

2019 ◽

Vol 24 (10) ◽

pp. 2006 ◽

Cited By ~ 1

Author(s):

Liadys Mora Lagares ◽

Nikola Minovski ◽

Marjana Novič

Keyword(s):

In Silico ◽

Transmembrane Protein ◽

External Validation ◽

Assessment Process ◽

Classification Model ◽

Training Set ◽

Test Set ◽

Active Compounds ◽

P Glycoprotein ◽

Validation Set

P-glycoprotein (P-gp) is a transmembrane protein that actively transports a wide variety of chemically diverse compounds out of the cell. It is highly associated with the ADMET (absorption, distribution, metabolism, excretion and toxicity) properties of drugs/drug candidates and contributes to decreasing toxicity by eliminating compounds from cells, thereby preventing intracellular accumulation. Therefore, in the drug discovery and toxicological assessment process it is advisable to pay attention to whether a compound under development could be transported by P-gp or not. In this study, an in silico multiclass classification model capable of predicting the probability of a compound to interact with P-gp was developed using a counter-propagation artificial neural network (CP ANN) based on a set of 2D molecular descriptors, as well as an extensive dataset of 2512 compounds (1178 P-gp inhibitors, 477 P-gp substrates and 857 P-gp non-active compounds). The model provided a good classification performance, producing non error rate (NER) values of 0.93 for the training set and 0.85 for the test set, while the average precision (AvPr) was 0.93 for the training set and 0.87 for the test set. An external validation set of 385 compounds was used to challenge the model’s performance. On the external validation set the NER and AvPr values were 0.70 for both indices. We believe that this in silico classifier could be effectively used as a reliable virtual screening tool for identifying potential P-gp ligands.

Download Full-text

Abstract P391: An-adcs 2 : A Novel Machine-Learning Model to Predict the Risk of Stroke-Associated Pneumonia

Stroke ◽

10.1161/str.52.suppl_1.p391 ◽

2021 ◽

Vol 52 (Suppl_1) ◽

Author(s):

Lingling Ding ◽

Zixiao Li ◽

Yongjun Wang

Keyword(s):

Machine Learning ◽

High Risk ◽

Predictive Value ◽

External Validation ◽

Learning Model ◽

Low Risk ◽

Stroke Recurrence ◽

Clinical Prognosis ◽

Machine Learning Model ◽

Validation Set

Objective: We aimed to develop and validate a machine learning-based prediction model that could assess the risk of stroke-associated pneumonia (SAP) for individual patients with acute ischemic stroke (AIS). Methods: A machine-learning model incorporating A 2 DS 2 scores and clinical features (AN-ADCS 2 ) was developed to predict the risk of SAP in patients with AIS. Two independent datasets were used for model derivation and external validation. The area under the receiver operating characteristic curve (AUC), sensitivity, specificity, positive predictive value (PPV) and negative predictive value (NPV) were estimated. The further analysis evaluated thresholds from the training set that identified patients as low-risk, intermediate-risk and high-risk, and performance at these thresholds was compared in the external validation set. Results: The AN-ADCS 2 model achieved favorable performance with a high AUC of 0.892 (95% confidence interval [CI] 0.885-0.898) in the test set and similar performance in the external validation set (AUC 0.813 [95% CI 0.812-0.814]). The AN-ADCS 2 threshold identifying low-risk was 0.03, with a NPV of 97.6% (97.2-97.9%) and sensitivity of 93.5% (92.5-94.5%). The AN-ADCS 2 threshold identifying high-risk was 0.65, with a PPV of 94.7% (93.9-95.6%) and specificity of 99.5% (99.5-99.6%). The AN-ADCS 2 model performed better than the A 2 DS 2 score (AUC 0.739, 95%CI [0.720-0.754]). Having a high risk of SAP classified by the AN-ADCS 2 was associated with unfavorable outcomes of mortality and in-hospital stroke recurrence. Conclusions: Using machine learning, the AN-ADCS 2 model provides an individualized risk prediction of SAP, which can be used as an indicator of clinical prognosis for patients with AIS.

Download Full-text

Predicting risk of stillbirth and preterm pregnancies with machine learning

Health Information Science and Systems ◽

10.1007/s13755-020-00105-9 ◽

2020 ◽

Vol 8 (1) ◽

Cited By ~ 3

Author(s):

Aki Koivu ◽

Mikko Sairanen

Keyword(s):

Machine Learning ◽

Preterm Birth ◽

Learning Strategies ◽

External Validation ◽

Statistical Modelling ◽

Gradient Boosting ◽

Data Set ◽

Selection Parameter ◽

Modelling Techniques ◽

Solid Foundation

AbstractModelling the risk of abnormal pregnancy-related outcomes such as stillbirth and preterm birth have been proposed in the past. Commonly they utilize maternal demographic and medical history information as predictors, and they are based on conventional statistical modelling techniques. In this study, we utilize state-of-the-art machine learning methods in the task of predicting early stillbirth, late stillbirth and preterm birth pregnancies. The aim of this experimentation is to discover novel risk models that could be utilized in a clinical setting. A CDC data set of almost sixteen million observations was used conduct feature selection, parameter optimization and verification of proposed models. An additional NYC data set was used for external validation. Algorithms such as logistic regression, artificial neural network and gradient boosting decision tree were used to construct individual classifiers. Ensemble learning strategies of these classifiers were also experimented with. The best performing machine learning models achieved 0.76 AUC for early stillbirth, 0.63 for late stillbirth and 0.64 for preterm birth while using a external NYC test data. The repeatable performance of our models demonstrates robustness that is required in this context. Our proposed novel models provide a solid foundation for risk prediction and could be further improved with the addition of biochemical and/or biophysical markers.

Download Full-text

VolleyJump: Uma aplicação para a análise de saltos no voleibol de praia

10.5753/webmedia.2018.4579 ◽

2018 ◽

Cited By ~ 1

Author(s):

Renan Bandeira ◽

Fernando Trinta ◽

João Gomes ◽

Marcio Maia ◽

Alexandre Araripe

Keyword(s):

Machine Learning ◽

Mobile Devices ◽

Learning Strategies ◽

High Performance ◽

Professional Sports ◽

Jump Height ◽

The Past ◽

The World ◽

Main Factors

Professional sports are increasingly dependents of technological resources given the remarkable level of competitiveness faced by high performance athletes. With such resources, it is possible to analyze matches, avoid mistakes that may be committed by the referee or even to analyze the athletes’ performance. One of these sports is beach volleyball, one of most popular sports in Brazil. In the past 12 years, the Brazilian volleyball teams has been always among the best teams in the world. The athletes’ performance during the jump movement is one of the main factors that one team needs to improve to be successful because it is the movement that is most performed during a volleyball match. There are some approaches that study the jump movement in order to calculate its height and give evidences to improve it. Nevertheless, these solutions are expensive and are not viable to athletes with no sponsorship. Having this in mind, this works presents VolleyJump, an application created to analyze beach volleyball athlete jumps using machine learning strategies to calculate the jump height and classify it as an attack or block jump. Results show that VolleyIoT makes possible to analyze athletes’ jumps using mobile devices sensors, helping them to focus on their trainning to improve its technique.

Download Full-text

ShapeGTB: The role of local DNA shape in prioritization of functional variants in human promoters with machine learning

10.7287/peerj.preprints.27199 ◽

2018 ◽

Author(s):

Maja Malkowska ◽

Julian Zubek ◽

Dariusz Plewczynski ◽

Lucjan S Wyrwicz

Keyword(s):

Machine Learning ◽

External Validation ◽

Gc Content ◽

Promoter Regions ◽

Functional Sequence ◽

Functional Variants ◽

Local Sequence ◽

Validation Set ◽

Dna Shape ◽

Coding Variants

Motivation: The identification of functional sequence variations in regulatory DNA regions is one of the major challenges of modern genetics. Here, we report results of a combined multifactor analysis of properties characterizing functional sequence variants located in promoter regions of genes. Results: We demonstrate that GC-content of the local sequence fragments and local DNA shape features play significant role in prioritization of functional variants and outscore features related to histone modifications, transcription factors binding sites, or evolutionary conservation descriptors. Those observations allowed us to build specialized machine learning classifier identifying functional SNPs within promoter regions – ShapeGTB. We compared our method with more general tools predicting pathogenicity of all non-coding variants. ShapeGTB outperformed them by a wide margin (AUC ROC 0.97 vs. 0.57-0.59). On the external validation set based on ClinVar database it displayed only slightly worse performance (AUC ROC 0.92 vs. 0.74-0.81). Such results suggest unique characteristics of mutations located within promoter regions and are a promising signal for the development of more accurate variant prioritization tools in the future. Availability and implementation: The datasets and source code are publicly available at: https://github.com/zubekj/ShapeGTB.

Download Full-text

Machine Learning for Predicting the 3-Year Risk of Incident Diabetes in Chinese Adults

Frontiers in Public Health ◽

10.3389/fpubh.2021.626331 ◽

2021 ◽

Vol 9 ◽

Author(s):

Yang Wu ◽

Haofei Hu ◽

Jinlin Cai ◽

Runtian Chen ◽

Xin Zuo ◽

...

Keyword(s):

Machine Learning ◽

External Validation ◽

Diabetes Risk ◽

Assessment System ◽

Incident Diabetes ◽

Clinical Use ◽

Chinese Adults ◽

Training Set ◽

Lemeshow Test ◽

Validation Set

Purpose: We aimed to establish and validate a risk assessment system that combines demographic and clinical variables to predict the 3-year risk of incident diabetes in Chinese adults.Methods: A 3-year cohort study was performed on 15,928 Chinese adults without diabetes at baseline. All participants were randomly divided into a training set (n = 7,940) and a validation set (n = 7,988). XGBoost method is an effective machine learning technique used to select the most important variables from candidate variables. And we further established a stepwise model based on the predictors chosen by the XGBoost model. The area under the receiver operating characteristic curve (AUC), decision curve and calibration analysis were used to assess discrimination, clinical use and calibration of the model, respectively. The external validation was performed on a cohort of 11,113 Japanese participants.Result: In the training and validation sets, 148 and 145 incident diabetes cases occurred. XGBoost methods selected the 10 most important variables from 15 candidate variables. Fasting plasma glucose (FPG), body mass index (BMI) and age were the top 3 important variables. And we further established a stepwise model and a prediction nomogram. The AUCs of the stepwise model were 0.933 and 0.910 in the training and validation sets, respectively. The Hosmer-Lemeshow test showed a perfect fit between the predicted diabetes risk and the observed diabetes risk (p = 0.068 for the training set, p = 0.165 for the validation set). Decision curve analysis presented the clinical use of the stepwise model and there was a wide range of alternative threshold probability spectrum. And there were almost no the interactions between these predictors (most P-values for interaction >0.05). Furthermore, the AUC for the external validation set was 0.830, and the Hosmer-Lemeshow test for the external validation set showed no statistically significant difference between the predicted diabetes risk and observed diabetes risk (P = 0.824).Conclusion: We established and validated a risk assessment system for characterizing the 3-year risk of incident diabetes.

Download Full-text

A simple 2D-QSPR model for the prediction of Setschenow constants of organic compounds

Macedonian Journal of Chemistry and Chemical Engineering ◽

10.20450/mjcce.2016.848 ◽

2016 ◽

Vol 35 (1) ◽

pp. 53 ◽

Cited By ~ 1

Author(s):

Qi Xu ◽

Lingling Fan ◽

Jie Xu

Keyword(s):

Organic Compounds ◽

External Validation ◽

Structure Property ◽

Training Set ◽

Multilinear Regression ◽

Qspr Model ◽

Multilinear Regression Analysis ◽

Validation Set ◽

Reliability And Robustness ◽

Leave One Out

A quantitative structure-property relationship (QSPR) analysis of the Setschenow constants (Ksalt) of organic compounds in a sodium chloride solution was carried out using only two-dimensional (2D) descriptors as input parameters. The whole set of 101 compounds was split into a training set of 71 compounds and a validation set of 30 compounds by means of the Kennard and Stones algorithm. A general four-parameter equation, with correlation coefficient (R) of 0.887 and standard error of estimation (s) of 0.031, was obtained by stepwise multilinear regression analysis (MLRA) on the training set. The reliability and robustness of the present model was verified with leave-one-out cross-validation, randomization tests, and the external validation set. All of the descriptors contained in this model are calculated directly from the molecular 2D structures; thus, this model can be used to easily predict the Ksalt of other compounds not involved in the present dataset.

Download Full-text

RNAPosers: Machine Learning Classifiers For RNA-Ligand Poses

10.1101/702449 ◽

2019 ◽

Author(s):

Sahil Chhabra ◽

Jingru Xie ◽

Aaron T. Frank

Keyword(s):

Machine Learning ◽

Small Molecule ◽

3D Structure ◽

Academic Community ◽

Pose Prediction ◽

Scoring Functions ◽

Machine Learning Classifiers ◽

Learning Classifiers ◽

Validation Set ◽

Leave One Out

ABSTRACTDetermining the 3-dimensional (3D) structures of ribonucleic acid (RNA)-small molecule complexes is critical to understanding molecular recognition in RNA. Computer docking can, in principle, be used to predict the 3D structure of RNA-small molecule complexes. Unfortunately, retrospective analysis has shown that the scoring functions that are typically used to rank poses tend to misclassify non-native poses as native, and vice versa. This misclassification of non-native poses severely limits the utility of computer docking in the context pose prediction, as well as in virtual screening. Here, we use machine learning to train a set of pose classifiers that estimate the relative “nativeness” of a set of RNA-ligand poses. At the heart of our approach is the use of a pose “fingerprint” that is a composite of a set of atomic fingerprints, which individually encode the local “RNA environment” around ligand atoms. We found that by ranking poses based on the classification scores from our machine learning classifiers, we were able to recover native-like poses better than when we ranked poses based on their docking scores. With a leave-one-out training and testing approach, we found that one of our classifiers could recover poses that were within 2.5 Å of the native poses in ∼80% of the 88 cases we examined, and similarly, on a separate validation set, we could recover such poses in ∼70% of the cases. Our set of classifiers, which we refer to as RNAPosers, should find utility as a tool to aid in RNA-ligand pose prediction and so we make RNAPosers open to the academic community via https://github.com/atfrank/RNAPosers.

Download Full-text

Predicting Target Profiles with Confidence as a Service using Docking Scores

10.21203/rs.3.rs-30526/v1 ◽

2020 ◽

Author(s):

Laeeq Ahmed ◽

Hiba Alogheli ◽

Staffan Arvidsson ◽

Jonathan Alvarsson ◽

Arvid Berg ◽

...

Keyword(s):

Molecular Docking ◽

External Validation ◽

Individual Compound ◽

Conformal Prediction ◽

Safety Issues ◽

Chemical Structures ◽

Comparable Performance ◽

Validation Set ◽

Target Binding ◽

Early Drug

Abstract Background Identifying and assessing ligand-target binding is a core component in early drug discovery as one or more unwanted interactions may be associated with safety issues. Contributions We present an open-source, extendable web service for predicting target profiles with confidence using machine learning for a panel of 7 targets, where models are trained on molecular docking scores from a large virtual library. The method uses conformal prediction to produce valid measures of prediction efficiency for a particular confidence level. The service also offers the possibility to dock chemical structures to the panel of targets with QuickVina on individual compound basis. Results The docking procedure and resulting models were validated by docking well-known inhibitors for each of the 7 targets using QuickVina. The model predictions showed comparable performance to molecular docking scores against an external validation set. The implementation as publicly available microservices on Kubernetes ensures resilience, scalability, and extensibility.

Download Full-text