scholarly journals MoDeSuS: A Machine Learning Tool for Selection of Molecular Descriptors in QSAR Studies Applied to Molecular Informatics

2019 ◽  
Vol 2019 ◽  
pp. 1-12 ◽  
Author(s):  
María Jimena Martínez ◽  
Marina Razuc ◽  
Ignacio Ponzoni

The selection of the most relevant molecular descriptors to describe a target variable in the context of QSAR (Quantitative Structure-Activity Relationship) modelling is a challenging combinatorial optimization problem. In this paper, a novel software tool for addressing this task in the context of regression and classification modelling is presented. The methodology that implements the tool is organized into two phases. The first phase uses a multiobjective evolutionary technique to perform the selection of subsets of descriptors. The second phase performs an external validation of the chosen descriptors subsets in order to improve reliability. The tool functionalities have been illustrated through a case study for the estimation of the ready biodegradation property as an example of classification QSAR modelling. The results obtained show the usefulness and potential of this novel software tool that aims to reduce the time and costs of development in the drug discovery process.

2018 ◽  
Vol 21 (5) ◽  
pp. 381-387 ◽  
Author(s):  
Hossein Atabati ◽  
Kobra Zarei ◽  
Hamid Reza Zare-Mehrjardi

Aim and Objective: Human dihydroorotate dehydrogenase (DHODH) catalyzes the fourth stage of the biosynthesis of pyrimidines in cells. Hence it is important to identify suitable inhibitors of DHODH to prevent virus replication. In this study, a quantitative structure-activity relationship was performed to predict the activity of one group of newly synthesized halogenated pyrimidine derivatives as inhibitors of DHODH. Materials and Methods: Molecular structures of halogenated pyrimidine derivatives were drawn in the HyperChem and then molecular descriptors were calculated by DRAGON software. Finally, the most effective descriptors for 32 halogenated pyrimidine derivatives were selected using bee algorithm. Results: The selected descriptors using bee algorithm were applied for modeling. The mean relative error and correlation coefficient were obtained as 2.86% and 0.9627, respectively, while these amounts for the leave one out−cross validation method were calculated as 4.18% and 0.9297, respectively. The external validation was also conducted using two training and test sets. The correlation coefficients for the training and test sets were obtained as 0.9596 and 0.9185, respectively. Conclusion: The results of modeling of present work showed that bee algorithm has good performance for variable selection in QSAR studies and its results were better than the constructed model with the selected descriptors using the genetic algorithm method.


Complexity ◽  
2018 ◽  
Vol 2018 ◽  
pp. 1-16 ◽  
Author(s):  
Yiwen Zhang ◽  
Yuanyuan Zhou ◽  
Xing Guo ◽  
Jintao Wu ◽  
Qiang He ◽  
...  

The K-means algorithm is one of the ten classic algorithms in the area of data mining and has been studied by researchers in numerous fields for a long time. However, the value of the clustering number k in the K-means algorithm is not always easy to be determined, and the selection of the initial centers is vulnerable to outliers. This paper proposes an improved K-means clustering algorithm called the covering K-means algorithm (C-K-means). The C-K-means algorithm can not only acquire efficient and accurate clustering results but also self-adaptively provide a reasonable numbers of clusters based on the data features. It includes two phases: the initialization of the covering algorithm (CA) and the Lloyd iteration of the K-means. The first phase executes the CA. CA self-organizes and recognizes the number of clusters k based on the similarities in the data, and it requires neither the number of clusters to be prespecified nor the initial centers to be manually selected. Therefore, it has a “blind” feature, that is, k is not preselected. The second phase performs the Lloyd iteration based on the results of the first phase. The C-K-means algorithm combines the advantages of CA and K-means. Experiments are carried out on the Spark platform, and the results verify the good scalability of the C-K-means algorithm. This algorithm can effectively solve the problem of large-scale data clustering. Extensive experiments on real data sets show that the accuracy and efficiency of the C-K-means algorithm outperforms the existing algorithms under both sequential and parallel conditions.


Author(s):  
Carlos Henrique Nascimento ◽  
Ires Paula de Andrade Miranda

The purpose was to analyze the Problem-based learning (PBL) as a methodological alternative for primary school that favor learning about Amazonian ecosystems. This research is descriptive with a qualitative-quantitative approach. The study was carried out with students from the 9th year of primary school. The teaching methodology based on the PBL was applied in two phases: In the first phase, a test of previous conceptions was carried out in order to know the perception of the students on topics related to some units of landscapes of the Amazonian ecosystems. The second phase consisted of the implementation of the learning methodology in the school environment. Four different phases were established in the application: i) selection of topics; ii) problem formulation; iii) problem solving; iv) synthesis and evaluation. The data collection instruments used were: preconceptions test and skills chart. The results showed that after the application of the ABRP methodology, the cognitive recognition of the Amazonian ecosystems can be perceived in the students, reaching additional goals that the PCN establish.


2014 ◽  
Vol 79 (9) ◽  
pp. 1111-1125 ◽  
Author(s):  
Dan-Dan Wang ◽  
Lin-Lin Feng ◽  
Guang-Yu He ◽  
Hai-Qun Chen

Quantitative structure-activity relationship (QSAR) models play a key role in finding the relationship between molecular structures and the toxicity of nitrobenzenes to Tetrahymena pyriformis. In this work, genetic algorithm, along with partial least square (GA-PLS) was employed to select optimal subset of descriptors that have significant contribution to the toxicity of nitrobenzenes to Tetrahymena pyriformis. A set of five descriptors, namely G2, HOMT, G(Cl?Cl), Mor03v and MAXDP, was used for the prediction of the toxicity of 45 nitrobenzene derivatives and then were used to build the model by multiple linear regression (MLR) method. It turned out that the built model, whose stability was confirmed using the leave-one-out validation and external validation test, showed high statistical significance (R2=0.963, Q2LOO=0.944). Moreover, Y-scrambling test indicated there was no chance correlation in this model.


2019 ◽  
Vol 16 (4) ◽  
pp. 453-460 ◽  
Author(s):  
Jiayu Li ◽  
Wenyue Tian ◽  
Diaohui Gao ◽  
Yuying Li ◽  
Yiqun Chang ◽  
...  

Background: Hepatitis C Virus (HCV) infection is the major cause of hepatitis after transfusion. And HCV Nonstructural Protein 5A (NS5A) inhibitors have become a new hotspot in the study of HCV inhibitors due to their strong antiviral activity, rapid speed of viral removing and broad antiviral spectrum. Methods: Forty-five NS5A inhibitors were chosen to process three-dimensional quantitative structure- activity relationship (3D-QSAR) by using comparative molecular field analysis (CoMFA) and comparative molecular similarity indices analysis (CoMSIA) models. A training set consisting of 30 compounds was applied to establish the models and a test set consisting of 15 compounds was applied to do the external validation. Results: The CoMFA model predicted a q2 value of 0.607 and an r2 value of 0.934. And the CoMSIA model predicted a q2 value of 0.516 and an r2 value of 0.960 established on the effects of steric, electrostatic, hydrophobic and hydrogen-bond acceptor. 0.713 and 0.939 were the predictive correlation co-efficients (r2pred) of CoMFA and CoMSIA models, respectively. Conclusion: These conclusions provide a theoretical basis for drug design and screening of HCV NS5A complex inhibitors.


Author(s):  
Mabrouk Hamadache ◽  
Abdeltif Amrane ◽  
Salah Hanini ◽  
Othmane Benkortbi

Quantitative Structure Activity Relationship (QSAR) models are expected to play an important role in the risk assessment of chemicals on humans and the environment. In this study, a QSAR model based on 10 molecular descriptors to predict acute oral toxicity of 91 fungicides to rats was developed and validated. Good results (PRESS/SSY = 0.085 and VIF < 5) were obtained, showing the validation of descriptors in the obtained model. The best results were obtained with a 10/11/1 Artificial Neural Network model trained with the Levenberg-Marquardt algorithm. The prediction accuracy for the external validation set was estimated by the Q2ext which was equal to 0.960. Accordingly, the model developed in this study provided excellent predictions and can be used to predict the acute oral toxicity of fungicides, particularly for those that have not been tested as well as new fungicides.


2011 ◽  
Vol 9 (5) ◽  
pp. 855-866 ◽  
Author(s):  
Nikola Minovski ◽  
Aneta Jezierska-Mazzarello ◽  
Marjan Vračko ◽  
Tom Šolmajer

AbstractA quantitative structure-activity relationship (QSAR) study on a set of 66 structurally-similar 6-fluoroquinolones was performed using a large pool of theoretical molecular descriptors. Ab initio geometry optimizations were carried out to reproduce the geometrical and electronic structure parameters. The resulting molecular structures were confirmed to be minima via harmonic frequency calculations. Obtained atomic charges, HOMO and LUMO energies, orbital electron densities, dipole moment, energy and many other properties served as quantum-chemical descriptors. A multiple linear regression (MLR) technique was applied to generate a linear model for predicting the biological activity, Minimal Inhibitory Concentration (MIC), treated as negative decade logarithm, (pMIC). The heuristic method was used to optimize the model parameters and select the most significant descriptors. The model was tested internally using the CV LOO procedure on the training set and validated against the external validation set. The result (Q 2 ext = 0.7393), which was obtained on an external, previously excluded validation data set, shows the predictive performances of this model (R 2tr = 0.7416, Q 2 tr = 0.6613) in establishing (Q)SAR of 6-fluoroquinolones. This validated model could be proficiently used to design new 6-fluoroquinolones with possible higher activity.


Symmetry ◽  
2019 ◽  
Vol 11 (7) ◽  
pp. 922 ◽  
Author(s):  
Piotr Cysewski ◽  
Maciej Przybyłek

The quantitative structure–activity relationship (QSPR) model was formulated to quantify values of the binding constant (lnK) of a series of ligands to beta–cyclodextrin (β-CD). For this purpose, the multivariate adaptive regression splines (MARSplines) methodology was adopted with molecular descriptors derived from the simplified molecular input line entry specification (SMILES) strings. This approach allows discovery of regression equations consisting of new non-linear components (basis functions) being combinations of molecular descriptors. The model was subjected to the standard internal and external validation procedures, which indicated its high predictive power. The appearance of polarity-related descriptors, such as XlogP, confirms the hydrophobic nature of the cyclodextrin cavity. The model can be used for predicting the affinity of new ligands to β-CD. However, a non-standard application was also proposed for classification into Biopharmaceutical Classification System (BCS) drug types. It was found that a single parameter, which is the estimated value of lnK, is sufficient to distinguish highly permeable drugs (BCS class I and II) from low permeable ones (BCS class II and IV). In general, it was found that drugs of the former group exhibit higher affinity to β-CD then the latter group (class III and IV).


Sign in / Sign up

Export Citation Format

Share Document