Drug repositioning by prediction of drug’s anatomical therapeutic chemical code via network-based inference approaches

Briefings in Bioinformatics ◽

10.1093/bib/bbaa027 ◽

2020 ◽

Cited By ~ 4

Author(s):

Yayuan Peng ◽

Manjiong Wang ◽

Yixiang Xu ◽

Zengrui Wu ◽

Jiye Wang ◽

...

Keyword(s):

Cross Validation ◽

Drug Repositioning ◽

Anatomical Therapeutic Chemical ◽

External Validation ◽

Chemical Properties ◽

Glucose Deprivation ◽

Target Drug ◽

Validation Set ◽

Fold Cross Validation

Abstract Drug discovery and development is a time-consuming and costly process. Therefore, drug repositioning has become an effective approach to address the issues by identifying new therapeutic or pharmacological actions for existing drugs. The drug’s anatomical therapeutic chemical (ATC) code is a hierarchical classification system categorized as five levels according to the organs or systems that drugs act and the pharmacology, therapeutic and chemical properties of drugs. The 2nd-, 3rd- and 4th-level ATC codes reserved the therapeutic and pharmacological information of drugs. With the hypothesis that drugs with similar structures or targets would possess similar ATC codes, we exploited a network-based approach to predict the 2nd-, 3rd- and 4th-level ATC codes by constructing substructure drug-ATC (SD-ATC), target drug-ATC (TD-ATC) and Substructure&Target drug-ATC (STD-ATC) networks. After 10-fold cross validation and two external validations, the STD-ATC models outperformed the SD-ATC and TD-ATC ones. Furthermore, with KR as fingerprint, the STD-ATC model was identified as the optimal model with AUC values at 0.899 ± 0.015, 0.916 and 0.893 for 10-fold cross validation, external validation set 1 and external validation set 2, respectively. To illustrate the predictive capability of the STD-ATC model with KR fingerprint, as a case study, we predicted 25 FDA-approved drugs (22 drugs were actually purchased) to have potential activities on heart failure using that model. Experiments in vitro confirmed that 8 of the 22 old drugs have shown mild to potent cardioprotective activities on both hypoxia model and oxygen–glucose deprivation model, which demonstrated that our STD-ATC prediction model would be an effective tool for drug repositioning.

Download Full-text

iATC-NRAKEL: an efficient multi-label classifier for recognizing anatomical therapeutic chemical classes of drugs

Bioinformatics ◽

10.1093/bioinformatics/btz757 ◽

2019 ◽

Cited By ~ 15

Author(s):

Jian-Peng Zhou ◽

Lei Chen ◽

Zi-Han Guo

Keyword(s):

Cross Validation ◽

Drug Repositioning ◽

Anatomical Therapeutic Chemical ◽

Support Vector ◽

Correct Identification ◽

Network Embedding ◽

Satisfactory Performance ◽

Comparison Results ◽

Essential Problem ◽

Fold Cross Validation

Abstract Motivation The anatomical therapeutic chemical (ATC) classification system plays an increasingly important role in drug repositioning and discovery. The correct identification of classes in each level of such system that a given drug may belong to is an essential problem. Several multi-label classifiers have been proposed in this regard. Although they provided satisfactory performance, the feature extraction procedures were still rough. More refined features may further improve the predicted quality. Results In this article, we provide a novel multi-label classifier, called iATC-NRAKEL, to predict drug ATC classes in the first level. To obtain more informative drug features, we employed the drug association information in STITCH and KEGG, which was organized by seven drug networks. The powerful network embedding algorithm, Mashup, was adopted to extract informative drug features. The obtained features were fed into the RAndom k-labELsets (RAKEL) algorithm with support vector machine as the basic classification algorithm to construct the classifier. The 10-fold cross-validation of the benchmark dataset with 3883 drugs showed that the accuracy and absolute true were 76.56 and 74.51%, respectively. The comparison results indicated that iATC-NRAKEL was much superior to all previous reported classifiers. Finally, the contribution of each network was analyzed. Availability and implementation The codes of iATC-NRAKEL are available at https://github.com/zhou256/iATC-NRAKEL.

Download Full-text

Insights into the molecular properties underlying antibacterial activity of prenylated (iso)flavonoids against MRSA

Scientific Reports ◽

10.1038/s41598-021-92964-9 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Sylvia Kalli ◽

Carla Araya-Cloutier ◽

Jos Hageman ◽

Jean-Paul Vincken

Keyword(s):

Prediction Models ◽

External Validation ◽

Qsar Model ◽

Prediction Errors ◽

Activity Data ◽

Gram Positive Bacteria ◽

Level Of Activity ◽

Formal Charge ◽

Validation Set

AbstractHigh resistance towards traditional antibiotics has urged the development of new, natural therapeutics against methicillin-resistant Staphylococcus aureus (MRSA). Prenylated (iso)flavonoids, present mainly in the Fabaceae, can serve as promising candidates. Herein, the anti-MRSA properties of 23 prenylated (iso)flavonoids were assessed in-vitro. The di-prenylated (iso)flavonoids, glabrol (flavanone) and 6,8-diprenyl genistein (isoflavone), together with the mono-prenylated, 4′-O-methyl glabridin (isoflavan), were the most active anti-MRSA compounds (Minimum Inhibitory Concentrations (MIC) ≤ 10 µg/mL, 30 µM). The in-house activity data was complemented with literature data to yield an extended, curated dataset of 67 molecules for the development of robust in-silico prediction models. A QSAR model having a good fit (R2adj 0.61), low average prediction errors and a good predictive power (Q2) for the training (4% and Q2LOO 0.57, respectively) and the test set (5% and Q2test 0.75, respectively) was obtained. Furthermore, the model predicted well the activity of an external validation set (on average 5% prediction errors), as well as the level of activity (low, moderate, high) of prenylated (iso)flavonoids against other Gram-positive bacteria. For the first time, the importance of formal charge, besides hydrophobic volume and hydrogen-bonding, in the anti-MRSA activity was highlighted, thereby suggesting potentially different modes of action of the different prenylated (iso)flavonoids.

Download Full-text

BMI prediction within a Korean population

PeerJ ◽

10.7717/peerj.3510 ◽

2017 ◽

Vol 5 ◽

pp. e3510 ◽

Cited By ~ 1

Author(s):

Jin Sol Lee ◽

Hyun Sub Cheong ◽

Hyoung-Doo Shin

Keyword(s):

Cross Validation ◽

Genome Wide Association Study ◽

Korean Population ◽

Risk Scores ◽

Link Type ◽

Genome Wide ◽

Clinical Trait ◽

Validation Set ◽

Fold Cross Validation

Background Body Mass Index (BMI) is widely regarded as an important clinical trait for obesity and other diseases such as Type 2 diabetes, coronary heart disease, and osteoarthritis. Methods This study uses 6,011 samples of genotype data from ethnic Korean subjects. The data was retrieved from the Korea Association Resource. To identify the BMI-related markers within the Korean population, we collected genome-wide association study (GWAS) markers using a GWAS catalog and also obtained other markers from nearby regions. Of the total 6,011 samples, 5,410 subjects were used as part of a single nucleotide polymorphism (SNP) selection set in order to identify the overlapping BMI-associated SNPs within a 10-fold cross validation. Results We selected nine SNPs (rs12566985 (FPGT-TNNI3K), rs6545809 (ADCY3), rs2943634 (located near LOC646736), rs734597 (located near TFAP2B), rs11030104 (BDNF), rs7988412 (GTF3A), rs2241423 (MAP2K5), rs7202116 (FTO), and rs6567160 (located near LOC105372152) to assist in BMI prediction. The calculated weighted genetic risk scores based on the selected 9 SNPs within the SNP selection set were applied to the final validation set consisting of 601 samples. Our results showed upward trends in the BMI values (P < 0.0001) within the 10-fold cross validation process for R2 > 0.22. These trends were also observed within the validation set for all subjects, as well as within the validation sets divided by gender (P < 0.0001, R2 > 0.46). Discussion The set of nine SNPs identified in this study may be useful for prospective predictions of BMI.

Download Full-text

Origin of aromatase inhibitory activity via proteochemometric modeling

PeerJ ◽

10.7717/peerj.1979 ◽

2016 ◽

Vol 4 ◽

pp. e1979 ◽

Cited By ~ 8

Author(s):

Saw Simeon ◽

Ola Spjuth ◽

Maris Lapins ◽

Sunanta Nabu ◽

Nuttapat Anuwongcharoen ◽

...

Keyword(s):

Breast Cancer ◽

Inhibitory Activity ◽

Cross Validation ◽

External Validation ◽

Predictive Performance ◽

Interaction Space ◽

Inhibitory Mechanisms ◽

Rate Limiting ◽

Good Predictive Performance ◽

Fold Cross Validation

Aromatase, the rate-limiting enzyme that catalyzes the conversion of androgen to estrogen, plays an essential role in the development of estrogen-dependent breast cancer. Side effects due to aromatase inhibitors (AIs) necessitate the pursuit of novel inhibitor candidates with high selectivity, lower toxicity and increased potency. Designing a novel therapeutic agent against aromatase could be achieved computationally by means of ligand-based and structure-based methods. For over a decade, we have utilized both approaches to design potential AIs for which quantitative structure–activity relationships and molecular docking were used to explore inhibitory mechanisms of AIs towards aromatase. However, such approaches do not consider the effects that aromatase variants have on different AIs. In this study, proteochemometrics modeling was applied to analyze the interaction space between AIs and aromatase variants as a function of their substructural and amino acid features. Good predictive performance was achieved, as rigorously verified by 10-fold cross-validation, external validation, leave-one-compound-out cross-validation, leave-one-protein-out cross-validation and Y-scrambling tests. The investigations presented herein provide important insights into the mechanisms of aromatase inhibitory activity that could aid in the design of novel potent AIs as breast cancer therapeutic agents.

Download Full-text

Machine learning meets pKa

F1000Research ◽

10.12688/f1000research.22090.2 ◽

2020 ◽

Vol 9 ◽

pp. 113 ◽

Cited By ~ 2

Author(s):

Marcel Baltruschat ◽

Paul Czodrowski

Keyword(s):

Machine Learning ◽

Cross Validation ◽

Mean Squared Error ◽

Mean Absolute Error ◽

External Validation ◽

Absolute Error ◽

Source Model ◽

Squared Error ◽

Fold Cross Validation ◽

Better Than

We present a small molecule pKa prediction tool entirely written in Python. It predicts the macroscopic pKa value and is trained on a literature compilation of monoprotic compounds. Different machine learning models were tested and random forest performed best given a five-fold cross-validation (mean absolute error=0.682, root mean squared error=1.032, correlation coefficient r2 =0.82). We test our model on two external validation sets, where our model performs comparable to Marvin and is better than a recently published open source model. Our Python tool and all data is freely available at https://github.com/czodrowskilab/Machine-learning-meets-pKa.

Download Full-text

Forecasting the daily demand for emergency medical ambulances in England and Wales: A benchmark model and external validation

10.31219/osf.io/a6nu5 ◽

2021 ◽

Author(s):

Thomas Monks ◽

Michael Allen

Keyword(s):

Time Series ◽

Cross Validation ◽

Prediction Interval ◽

External Validation ◽

Demand Forecasting ◽

Benchmark Model ◽

Model Combining ◽

Emergency Ambulance ◽

Validation Set ◽

Interval Coverage

BackgroundWe aimed to select and externally validate a benchmark method for emergency ambulance services to use to forecast the daily number of calls that result in the dispatch of one or more ambulances. The study was conducted using standard methods known to the UK's NHS to aid implementation in practice.MethodsWe selected our benchmark model from a naive benchmark and 14 standard forecasting methods. Mean absolute scaled error and 80 and 95\% prediction interval coverage over a 84 day horizon were evaluated using time series cross validation across eight time series from the South West of England. External validation was conducted by time series cross validation across 13 time series from London, Yorkshire and Welsh Ambulance Services. ResultsA model combining a simple average of Facebook's Prophet and regression with ARIMA Errors (1, 1, 3)(1, 0, 1, 7) was selected. Benchmark MASE, 80 and 95\% prediction intervals were 0.68 (95% CI 0.67 - 0.69), 0.847 (95% CI 0.843 - 0.851), and 0.965 (95% CI 0.949 - 0.977), respectively. Performance in the validation set was within expected ranges for MASE, 0.73 (95% CI 0.72 - 0.74) 80\% coverage (0.833; 95% CI 0.828-0.838), and 95\% coverage (0.965; 95% CI 0.963-0.967).ConclusionsWe provide a robust externally validated benchmark for future ambulance demand forecasting studies to improve on. Our benchmark forecasting model is high quality and usable by ambulance services. We provide a simple python framework to aid its implementation in practice.

Download Full-text

Prediction of cholesterol ratios within a Korean population

Royal Society Open Science ◽

10.1098/rsos.171204 ◽

2018 ◽

Vol 5 (1) ◽

pp. 171204

Author(s):

Jin Sol Lee ◽

Hyun Sub Cheong ◽

Hyoung Doo Shin

Keyword(s):

Genetic Risk ◽

Cross Validation ◽

Density Lipoprotein ◽

Risk Scores ◽

Genotype Data ◽

Nucleotide Polymorphism ◽

Single Nucleotide ◽

Regression Slopes ◽

Validation Set ◽

Fold Cross Validation

Cholesterol ratios (total cholesterol (TC)/high-density lipoprotein cholesterol (HDL-c) and triglyceride (TG)/HDL-c) have been suggested as better indicators to predict various clinical features such as insulin resistance and heart disease. Therefore, we aimed to build a single nucleotide polymorphism (SNP) set to predict constitutional lipid metabolism. The genotype data of 7795 samples were obtained from the Korea Association Resource. Among the total of 7795 samples, 7016 subjects were used to perform 10-fold cross-validation. We selected the SNPs that showed significance constantly throughout all 10 cross-validation sets; another 779 samples were used as the final validation set. After performing the 10-fold cross-validation, the six SNPs ( rs4420638 ( APOC1 ), rs12421652 ( BUD13 ) , rs17411126 ( LPL ) , rs6589566 ( ZPR1 ) , rs16940212 ( LOC101928635 ) and rs10852765 ( ABCA8 )) were finally selected for predicting cholesterol ratios. The weighted genetic risk scores (wGRS) were calculated based on the regression slopes of the six selected SNPs. Our results showed upward trends of wGRS for both the TC/HDL-c and TG/HDL-c ratios within the 10-fold cross-validation. Similarly, the wGRS of the six SNPs also showed upward trends in analyses using the SNP selection set and final validation set. The selected six SNPs can be used to explain both the TC/HDL-c and TG/HDL-c ratios. Our results may be useful for the prospective predictions of cholesterol-related diseases.

Download Full-text

Predicting Drug-Disease Associations via Using Gaussian Interaction Profile and Kernel-Based Autoencoder

BioMed Research International ◽

10.1155/2019/2426958 ◽

2019 ◽

Vol 2019 ◽

pp. 1-11 ◽

Cited By ~ 4

Author(s):

Han-Jing Jiang ◽

Yu-An Huang ◽

Zhu-Hong You

Keyword(s):

Case Studies ◽

Computational Models ◽

Cross Validation ◽

Drug Repositioning ◽

Feature Learning ◽

Superior Performance ◽

Reliable Model ◽

Disease Associations ◽

The Cost ◽

Fold Cross Validation

Computational drug repositioning, designed to identify new indications for existing drugs, significantly reduced the cost and time involved in drug development. Prediction of drug-disease associations is promising for drug repositioning. Recent years have witnessed an increasing number of machine learning-based methods for calculating drug repositioning. In this paper, a novel feature learning method based on Gaussian interaction profile kernel and autoencoder (GIPAE) is proposed for drug-disease association. In order to further reduce the computation cost, both batch normalization layer and the full-connected layer are introduced to reduce training complexity. The experimental results of 10-fold cross validation indicate that the proposed method achieves superior performance on Fdataset and Cdataset with the AUCs of 93.30% and 96.03%, respectively, which were higher than many previous computational models. To further assess the accuracy of GIPAE, we conducted case studies on two complex human diseases. The top 20 drugs predicted, 14 obesity-related drugs, and 11 drugs related to Alzheimer's disease were validated in the CTD database. The results of cross validation and case studies indicated that GIPAE is a reliable model for predicting drug-disease associations.

Download Full-text

Mal-Prec: computational prediction of protein Malonylation sites via machine learning based feature integration

BMC Genomics ◽

10.1186/s12864-020-07166-w ◽

2020 ◽

Vol 21 (1) ◽

Author(s):

Xin Liu ◽

Liang Wang ◽

Jian Li ◽

Junfeng Hu ◽

Xiao Zhang

Keyword(s):

Cross Validation ◽

Chemical Properties ◽

Principal Component ◽

Computational Prediction ◽

Computational Method ◽

Support Vector ◽

Data Sets ◽

Post Translational Modification ◽

Human Proteins ◽

Fold Cross Validation

Abstract Background Malonylation is a recently discovered post-translational modification that is associated with a variety of diseases such as Type 2 Diabetes Mellitus and different types of cancers. Compared with experimental identification of malonylation sites, computational method is a time-effective process with comparatively low costs. Results In this study, we proposed a novel computational model called Mal-Prec (Malonylation Prediction) for malonylation site prediction through the combination of Principal Component Analysis and Support Vector Machine. One-hot encoding, physio-chemical properties, and composition of k-spaced acid pairs were initially performed to extract sequence features. PCA was then applied to select optimal feature subsets while SVM was adopted to predict malonylation sites. Five-fold cross-validation results showed that Mal-Prec can achieve better prediction performance compared with other approaches. AUC (area under the receiver operating characteristic curves) analysis achieved 96.47 and 90.72% on 5-fold cross-validation of independent data sets, respectively. Conclusion Mal-Prec is a computationally reliable method for identifying malonylation sites in protein sequences. It outperforms existing prediction tools and can serve as a useful tool for identifying and discovering novel malonylation sites in human proteins. Mal-Prec is coded in MATLAB and is publicly available at https://github.com/flyinsky6/Mal-Prec, together with the data sets used in this study.

Download Full-text

Prediction of Drug Indications Based on Chemical Interactions and Chemical Similarities

BioMed Research International ◽

10.1155/2015/584546 ◽

2015 ◽

Vol 2015 ◽

pp. 1-14 ◽

Cited By ~ 2

Author(s):

Guohua Huang ◽

Yin Lu ◽

Changhong Lu ◽

Mingyue Zheng ◽

Yu-Dong Cai

Keyword(s):

Drug Development ◽

Large Scale ◽

Cross Validation ◽

Drug Repositioning ◽

Accuracy Rate ◽

Computational Approaches ◽

Starting Point ◽

Independent Test ◽

Approved Drugs ◽

Fold Cross Validation

Discovering potential indications of novel or approved drugs is a key step in drug development. Previous computational approaches could be categorized into disease-centric and drug-centric based on the starting point of the issues or small-scaled application and large-scale application according to the diversity of the datasets. Here, a classifier has been constructed to predict the indications of a drug based on the assumption that interactive/associated drugs or drugs with similar structures are more likely to target the same diseases using a large drug indication dataset. To examine the classifier, it was conducted on a dataset with 1,573 drugs retrieved from Comprehensive Medicinal Chemistry database for five times, evaluated by 5-fold cross-validation, yielding five 1st order prediction accuracies that were all approximately 51.48%. Meanwhile, the model yielded an accuracy rate of 50.00% for the 1st order prediction by independent test on a dataset with 32 other drugs in which drug repositioning has been confirmed. Interestingly, some clinically repurposed drug indications that were not included in the datasets are successfully identified by our method. These results suggest that our method may become a useful tool to associate novel molecules with new indications or alternative indications with existing drugs.

Download Full-text