MoDeSuS: A Machine Learning Tool for Selection of Molecular Descriptors in QSAR Studies Applied to Molecular Informatics

BioMed Research International ◽

10.1155/2019/2905203 ◽

2019 ◽

Vol 2019 ◽

pp. 1-12 ◽

Cited By ~ 3

Author(s):

María Jimena Martínez ◽

Marina Razuc ◽

Ignacio Ponzoni

Keyword(s):

Molecular Descriptors ◽

External Validation ◽

Software Tool ◽

Quantitative Structure Activity Relationship ◽

Second Phase ◽

Qsar Studies ◽

Qsar Modelling ◽

Evolutionary Technique ◽

Two Phases ◽

Selection Of

The selection of the most relevant molecular descriptors to describe a target variable in the context of QSAR (Quantitative Structure-Activity Relationship) modelling is a challenging combinatorial optimization problem. In this paper, a novel software tool for addressing this task in the context of regression and classification modelling is presented. The methodology that implements the tool is organized into two phases. The first phase uses a multiobjective evolutionary technique to perform the selection of subsets of descriptors. The second phase performs an external validation of the chosen descriptors subsets in order to improve reliability. The tool functionalities have been illustrated through a case study for the estimation of the ready biodegradation property as an example of classification QSAR modelling. The results obtained show the usefulness and potential of this novel software tool that aims to reduce the time and costs of development in the drug discovery process.

Download Full-text

QSAR Studies of Halogenated Pyrimidine Derivatives as Inhibitors of Human Dihydroorotate Dehydrogenase Using Modified Bee Algorithm

Combinatorial Chemistry & High Throughput Screening ◽

10.2174/1386207321666180611092540 ◽

2018 ◽

Vol 21 (5) ◽

pp. 381-387 ◽

Cited By ~ 1

Author(s):

Hossein Atabati ◽

Kobra Zarei ◽

Hamid Reza Zare-Mehrjardi

Keyword(s):

External Validation ◽

Correlation Coefficients ◽

Quantitative Structure Activity Relationship ◽

Molecular Structures ◽

Dihydroorotate Dehydrogenase ◽

Pyrimidine Derivatives ◽

Qsar Studies ◽

Bee Algorithm ◽

Test Sets ◽

Leave One Out

Aim and Objective: Human dihydroorotate dehydrogenase (DHODH) catalyzes the fourth stage of the biosynthesis of pyrimidines in cells. Hence it is important to identify suitable inhibitors of DHODH to prevent virus replication. In this study, a quantitative structure-activity relationship was performed to predict the activity of one group of newly synthesized halogenated pyrimidine derivatives as inhibitors of DHODH. Materials and Methods: Molecular structures of halogenated pyrimidine derivatives were drawn in the HyperChem and then molecular descriptors were calculated by DRAGON software. Finally, the most effective descriptors for 32 halogenated pyrimidine derivatives were selected using bee algorithm. Results: The selected descriptors using bee algorithm were applied for modeling. The mean relative error and correlation coefficient were obtained as 2.86% and 0.9627, respectively, while these amounts for the leave one out−cross validation method were calculated as 4.18% and 0.9297, respectively. The external validation was also conducted using two training and test sets. The correlation coefficients for the training and test sets were obtained as 0.9596 and 0.9185, respectively. Conclusion: The results of modeling of present work showed that bee algorithm has good performance for variable selection in QSAR studies and its results were better than the constructed model with the selected descriptors using the genetic algorithm method.

Download Full-text

Self-Adaptive K-Means Based on a Covering Algorithm

Complexity ◽

10.1155/2018/7698274 ◽

2018 ◽

Vol 2018 ◽

pp. 1-16 ◽

Cited By ~ 1

Author(s):

Yiwen Zhang ◽

Yuanyuan Zhou ◽

Xing Guo ◽

Jintao Wu ◽

Qiang He ◽

...

Keyword(s):

Large Scale ◽

Clustering Algorithm ◽

Real Data ◽

Second Phase ◽

Data Sets ◽

Number Of Clusters ◽

Large Scale Data ◽

Long Time ◽

Two Phases ◽

Selection Of

The K-means algorithm is one of the ten classic algorithms in the area of data mining and has been studied by researchers in numerous fields for a long time. However, the value of the clustering number k in the K-means algorithm is not always easy to be determined, and the selection of the initial centers is vulnerable to outliers. This paper proposes an improved K-means clustering algorithm called the covering K-means algorithm (C-K-means). The C-K-means algorithm can not only acquire efficient and accurate clustering results but also self-adaptively provide a reasonable numbers of clusters based on the data features. It includes two phases: the initialization of the covering algorithm (CA) and the Lloyd iteration of the K-means. The first phase executes the CA. CA self-organizes and recognizes the number of clusters k based on the similarities in the data, and it requires neither the number of clusters to be prespecified nor the initial centers to be manually selected. Therefore, it has a “blind” feature, that is, k is not preselected. The second phase performs the Lloyd iteration based on the results of the first phase. The C-K-means algorithm combines the advantages of CA and K-means. Experiments are carried out on the Spark platform, and the results verify the good scalability of the C-K-means algorithm. This algorithm can effectively solve the problem of large-scale data clustering. Extensive experiments on real data sets show that the accuracy and efficiency of the C-K-means algorithm outperforms the existing algorithms under both sequential and parallel conditions.

Download Full-text

Estudio de la enseñanza de ecosistemas amazónicos a través de la metodología de aprendizaje basada en la resolución de problemas (ABRP)

Revista Eletrônica em Gestão Educação e Tecnologia Ambiental ◽

10.5902/2236117032593 ◽

2018 ◽

Vol 22 ◽

pp. 9

Author(s):

Carlos Henrique Nascimento ◽

Ires Paula de Andrade Miranda

Keyword(s):

Problem Solving ◽

Data Collection ◽

Primary School ◽

School Environment ◽

Problem Based Learning ◽

Problem Formulation ◽

Teaching Methodology ◽

Second Phase ◽

Two Phases ◽

Selection Of

The purpose was to analyze the Problem-based learning (PBL) as a methodological alternative for primary school that favor learning about Amazonian ecosystems. This research is descriptive with a qualitative-quantitative approach. The study was carried out with students from the 9th year of primary school. The teaching methodology based on the PBL was applied in two phases: In the first phase, a test of previous conceptions was carried out in order to know the perception of the students on topics related to some units of landscapes of the Amazonian ecosystems. The second phase consisted of the implementation of the learning methodology in the school environment. Four different phases were established in the application: i) selection of topics; ii) problem formulation; iii) problem solving; iv) synthesis and evaluation. The data collection instruments used were: preconceptions test and skills chart. The results showed that after the application of the ABRP methodology, the cognitive recognition of the Amazonian ecosystems can be perceived in the students, reaching additional goals that the PCN establish.

Download Full-text

QSAR studies for the acute toxicity of nitrobenzenes to the Tetrahymena pyriformis

Journal of the Serbian Chemical Society ◽

10.2298/jsc130910025w ◽

2014 ◽

Vol 79 (9) ◽

pp. 1111-1125 ◽

Cited By ~ 3

Author(s):

Dan-Dan Wang ◽

Lin-Lin Feng ◽

Guang-Yu He ◽

Hai-Qun Chen

Keyword(s):

Statistical Significance ◽

External Validation ◽

Tetrahymena Pyriformis ◽

Quantitative Structure Activity Relationship ◽

Partial Least Square ◽

Molecular Structures ◽

Least Square ◽

Qsar Studies ◽

Qsar Models ◽

Leave One Out

Quantitative structure-activity relationship (QSAR) models play a key role in finding the relationship between molecular structures and the toxicity of nitrobenzenes to Tetrahymena pyriformis. In this work, genetic algorithm, along with partial least square (GA-PLS) was employed to select optimal subset of descriptors that have significant contribution to the toxicity of nitrobenzenes to Tetrahymena pyriformis. A set of five descriptors, namely G2, HOMT, G(Cl?Cl), Mor03v and MAXDP, was used for the prediction of the toxicity of 45 nitrobenzene derivatives and then were used to build the model by multiple linear regression (MLR) method. It turned out that the built model, whose stability was confirmed using the leave-one-out validation and external validation test, showed high statistical significance (R2=0.963, Q2LOO=0.944). Moreover, Y-scrambling test indicated there was no chance correlation in this model.

Download Full-text

Nuclear magnetic resonance (NMR) and quantitative structure–activity relationship (QSAR) studies on the transacylation reactivity of model 1β-O-acyl glucuronides. II: QSAR modelling of the reaction using both computational and experimental NMR parameters

Xenobiotica ◽

10.1080/00498250400005674 ◽

2004 ◽

Vol 34 (10) ◽

pp. 889-900 ◽

Cited By ~ 23

Author(s):

S. J. Vanderhoeven ◽

J. Troke ◽

G. E. Tranter ◽

I. D. Wilson ◽

J. K. Nicholson ◽

...

Keyword(s):

Nuclear Magnetic Resonance ◽

Magnetic Resonance ◽

Quantitative Structure Activity Relationship ◽

Activity Relationship ◽

Quantitative Structure ◽

Qsar Studies ◽

Nmr Parameters ◽

Structure Activity ◽

Qsar Modelling ◽

Acyl Glucuronides

Download Full-text

Quantitative structure-activity relationship (QSAR) studies of quinolone antibacterials against M. fortuitum and M. smegmatis using theoretical molecular descriptors

Journal of Molecular Modeling ◽

10.1007/s00894-006-0133-z ◽

2006 ◽

Vol 13 (1) ◽

pp. 111-120 ◽

Cited By ~ 18

Author(s):

Manish C. Bagchi ◽

Denise Mills ◽

Subhash C. Basak

Keyword(s):

Molecular Descriptors ◽

Quantitative Structure Activity Relationship ◽

Structure Activity Relationship ◽

Activity Relationship ◽

Quantitative Structure ◽

Qsar Studies ◽

Structure Activity

Download Full-text

QSAR Studies on Thiazole Derivatives as HCV NS5A Inhibitors via CoMFA and CoMSIA Methods

Letters in Drug Design & Discovery ◽

10.2174/1570180815666180702153529 ◽

2019 ◽

Vol 16 (4) ◽

pp. 453-460 ◽

Cited By ~ 1

Author(s):

Jiayu Li ◽

Wenyue Tian ◽

Diaohui Gao ◽

Yuying Li ◽

Yiqun Chang ◽

...

Keyword(s):

Nonstructural Protein ◽

External Validation ◽

Three Dimensional ◽

3D Qsar ◽

Quantitative Structure Activity Relationship ◽

Field Analysis ◽

Thiazole Derivatives ◽

Hydrogen Bond Acceptor ◽

Qsar Studies ◽

Comfa Model

Background: Hepatitis C Virus (HCV) infection is the major cause of hepatitis after transfusion. And HCV Nonstructural Protein 5A (NS5A) inhibitors have become a new hotspot in the study of HCV inhibitors due to their strong antiviral activity, rapid speed of viral removing and broad antiviral spectrum. Methods: Forty-five NS5A inhibitors were chosen to process three-dimensional quantitative structure- activity relationship (3D-QSAR) by using comparative molecular field analysis (CoMFA) and comparative molecular similarity indices analysis (CoMSIA) models. A training set consisting of 30 compounds was applied to establish the models and a test set consisting of 15 compounds was applied to do the external validation. Results: The CoMFA model predicted a q2 value of 0.607 and an r2 value of 0.934. And the CoMSIA model predicted a q2 value of 0.516 and an r2 value of 0.960 established on the effects of steric, electrostatic, hydrophobic and hydrogen-bond acceptor. 0.713 and 0.939 were the predictive correlation co-efficients (r2pred) of CoMFA and CoMSIA models, respectively. Conclusion: These conclusions provide a theoretical basis for drug design and screening of HCV NS5A complex inhibitors.

Download Full-text

Multilayer Perceptron Model for Predicting Acute Toxicity of Fungicides on Rats

International Journal of Quantitative Structure-Property Relationships ◽

10.4018/ijqspr.2018010106 ◽

2018 ◽

Vol 3 (1) ◽

pp. 100-118 ◽

Cited By ~ 1

Author(s):

Mabrouk Hamadache ◽

Abdeltif Amrane ◽

Salah Hanini ◽

Othmane Benkortbi

Keyword(s):

Molecular Descriptors ◽

External Validation ◽

Qsar Model ◽

Quantitative Structure Activity Relationship ◽

Acute Oral Toxicity ◽

Oral Toxicity ◽

Structure Activity ◽

Marquardt Algorithm ◽

Qsar Models ◽

Validation Set

Quantitative Structure Activity Relationship (QSAR) models are expected to play an important role in the risk assessment of chemicals on humans and the environment. In this study, a QSAR model based on 10 molecular descriptors to predict acute oral toxicity of 91 fungicides to rats was developed and validated. Good results (PRESS/SSY = 0.085 and VIF < 5) were obtained, showing the validation of descriptors in the obtained model. The best results were obtained with a 10/11/1 Artificial Neural Network model trained with the Levenberg-Marquardt algorithm. The prediction accuracy for the external validation set was estimated by the Q2ext which was equal to 0.960. Accordingly, the model developed in this study provided excellent predictions and can be used to predict the acute oral toxicity of fungicides, particularly for those that have not been tested as well as new fungicides.

Download Full-text

Investigation of 6-fluoroquinolones activity against Mycobacterium tuberculosis using theoretical molecular descriptors: a case study

Open Chemistry ◽

10.2478/s11532-011-0071-1 ◽

2011 ◽

Vol 9 (5) ◽

pp. 855-866 ◽

Cited By ~ 7

Author(s):

Nikola Minovski ◽

Aneta Jezierska-Mazzarello ◽

Marjan Vračko ◽

Tom Šolmajer

Keyword(s):

Molecular Descriptors ◽

External Validation ◽

Heuristic Method ◽

Quantitative Structure Activity Relationship ◽

Molecular Structures ◽

Model Parameters ◽

Quantum Chemical Descriptors ◽

Qsar Study ◽

Validation Data ◽

Data Set

AbstractA quantitative structure-activity relationship (QSAR) study on a set of 66 structurally-similar 6-fluoroquinolones was performed using a large pool of theoretical molecular descriptors. Ab initio geometry optimizations were carried out to reproduce the geometrical and electronic structure parameters. The resulting molecular structures were confirmed to be minima via harmonic frequency calculations. Obtained atomic charges, HOMO and LUMO energies, orbital electron densities, dipole moment, energy and many other properties served as quantum-chemical descriptors. A multiple linear regression (MLR) technique was applied to generate a linear model for predicting the biological activity, Minimal Inhibitory Concentration (MIC), treated as negative decade logarithm, (pMIC). The heuristic method was used to optimize the model parameters and select the most significant descriptors. The model was tested internally using the CV LOO procedure on the training set and validated against the external validation set. The result (Q 2 ext = 0.7393), which was obtained on an external, previously excluded validation data set, shows the predictive performances of this model (R 2tr = 0.7416, Q 2 tr = 0.6613) in establishing (Q)SAR of 6-fluoroquinolones. This validated model could be proficiently used to design new 6-fluoroquinolones with possible higher activity.

Download Full-text

Predicting Value of Binding Constants of Organic Ligands to Beta-Cyclodextrin: Application of MARSplines and Descriptors Encoded in SMILES String

Symmetry ◽

10.3390/sym11070922 ◽

2019 ◽

Vol 11 (7) ◽

pp. 922 ◽

Cited By ~ 1

Author(s):

Piotr Cysewski ◽

Maciej Przybyłek

Keyword(s):

Molecular Descriptors ◽

External Validation ◽

Binding Constants ◽

Quantitative Structure Activity Relationship ◽

Multivariate Adaptive Regression Splines ◽

Regression Equations ◽

Class Iii ◽

Qspr Model ◽

Beta Cyclodextrin ◽

Standard Application

The quantitative structure–activity relationship (QSPR) model was formulated to quantify values of the binding constant (lnK) of a series of ligands to beta–cyclodextrin (β-CD). For this purpose, the multivariate adaptive regression splines (MARSplines) methodology was adopted with molecular descriptors derived from the simplified molecular input line entry specification (SMILES) strings. This approach allows discovery of regression equations consisting of new non-linear components (basis functions) being combinations of molecular descriptors. The model was subjected to the standard internal and external validation procedures, which indicated its high predictive power. The appearance of polarity-related descriptors, such as XlogP, confirms the hydrophobic nature of the cyclodextrin cavity. The model can be used for predicting the affinity of new ligands to β-CD. However, a non-standard application was also proposed for classification into Biopharmaceutical Classification System (BCS) drug types. It was found that a single parameter, which is the estimated value of lnK, is sufficient to distinguish highly permeable drugs (BCS class I and II) from low permeable ones (BCS class II and IV). In general, it was found that drugs of the former group exhibit higher affinity to β-CD then the latter group (class III and IV).

Download Full-text