scholarly journals Gaussian Moments as Physically Inspired Molecular Descriptors for Accurate and Scalable Machine Learning Potentials

2020 ◽  
Vol 16 (8) ◽  
pp. 5410-5421 ◽  
Author(s):  
V. Zaverkin ◽  
J. Kästner
2020 ◽  
Author(s):  
Xinzhe Zhu ◽  
Chi-Hung Ho ◽  
Xiaonan Wang

<p><a></a><a>The production process of many active pharmaceutical ingredients such as sitagliptin could cause severe environmental problems due to the use of toxic chemical materials and production infrastructure, energy consumption and wastes treatment. The environmental impacts of sitagliptin production process were estimated with life cycle assessment (LCA) method, which suggested that the use of chemical materials provided the major environmental impacts. Both methods of Eco-indicator 99 and ReCiPe endpoints confirmed that chemical feedstock accounted 83% and 70% of life-cycle impact, respectively. Among all the chemical materials used in the sitagliptin production process, </a><a>trifluoroacetic anhydride </a>was identified as the largest influential factor in most impact categories according to the results of ReCiPe midpoints method. Therefore, high-throughput screening was performed to seek for green chemical substitutes to replace the target chemical (i.e. trifluoroacetic anhydride) by the following three steps. Firstly, thirty most similar chemicals were obtained from two million candidate alternatives in PubChem database based on their molecular descriptors. Thereafter, deep learning neural network models were developed to predict life-cycle impact according to the chemicals in Ecoinvent v3.5 database with known LCA values and corresponding molecular descriptors. Finally, 1,2-ethanediyl ester was proved to be one of the potential greener substitutes after the LCA data of these similar chemicals were predicted using the well-trained machine learning models. The case study demonstrated the applicability of the novel framework to screen green chemical substitutes and optimize the pharmaceutical manufacturing process.</p>


2009 ◽  
Vol 26 (9) ◽  
pp. 2216-2224 ◽  
Author(s):  
Dmitriy Chekmarev ◽  
Vladyslav Kholodovych ◽  
Sandhya Kortagere ◽  
William J. Welsh ◽  
Sean Ekins

2021 ◽  
Vol 12 ◽  
Author(s):  
Wojciech Lesiński ◽  
Krzysztof Mnich ◽  
Witold R. Rudnicki

Motivation: Drug-induced liver injury (DILI) is one of the primary problems in drug development. Early prediction of DILI, based on the chemical properties of substances and experiments performed on cell lines, would bring a significant reduction in the cost of clinical trials and faster development of drugs. The current study aims to build predictive models of risk of DILI for chemical compounds using multiple sources of information.Methods: Using several supervised machine learning algorithms, we built predictive models for several alternative splits of compounds between DILI and non-DILI classes. To this end, we used chemical properties of the given compounds, their effects on gene expression levels in six human cell lines treated with them, as well as their toxicological profiles. First, we identified the most informative variables in all data sets. Then, these variables were used to build machine learning models. Finally, composite models were built with the Super Learner approach. All modeling was performed using multiple repeats of cross-validation for unbiased and precise estimates of performance.Results: With one exception, gene expression profiles of human cell lines were non-informative and resulted in random models. Toxicological reports were not useful for prediction of DILI. The best results were obtained for models discerning between harmless compounds and those for which any level of DILI was observed (AUC = 0.75). These models were built with Random Forest algorithm that used molecular descriptors.


Author(s):  
Abdelkader A Metwally ◽  
Amira A Nayel ◽  
Rania M Hathout

In silico prediction of the in vivo efficacy of siRNA ionizable-lipid nanoparticles is desirable yet never achieved before. This study aims to computationally predict siRNA nanoparticles in vivo efficacy, which saves time and resources. A data set containing 120 entries was prepared by combining molecular descriptors of the ionizable lipids together with two nanoparticles formulation characteristics. Input descriptor combinations were selected by an evolutionary algorithm. Artificial neural networks, support vector machines and partial least squares regression were used for QSAR modeling. Depending on how the data set is split, two training sets and two external validation sets were prepared. Training and validation sets contained 90 and 30 entries respectively. The results showed the successful predictions of validation set log(dose) with R2val = 0.86 &ndash; 0.89 and 0.75 &ndash; 80 for validation sets one and two respectively. Artificial neural networks resulted in the best R2val for both validation sets. For predictions that have high bias, improvement of R2val from 0.47 to 0.96 was achieved by selecting the training set lipids lying within the applicability domain. In conclusion, in vivo performance of siRNA nanoparticles was successfully predicted by combining cheminformatics with machine learning techniques.


Author(s):  
Sara S. El Zahed ◽  
Shawn French ◽  
Maya A. Farha ◽  
Garima Kumar ◽  
Eric D. Brown

Discovering new Gram-negative antibiotics has been a challenge for decades. This has been largely attributed to a limited understanding of the molecular descriptors governing Gram-negative permeation and efflux evasion. Herein, we address the contribution of efflux using a novel approach that applies multivariate analysis, machine learning, and structure-based clustering to some 4,500 actives from a small molecule screen in efflux-compromised Escherichia coli. We employed principal-component analysis and trained two decision tree-based machine learning models to investigate descriptors contributing to the antibacterial activity and efflux susceptibility of these actives. This approach revealed that the Gram-negative activity of hydrophobic and planar small molecules with low molecular stability is limited to efflux-compromised E. coli. Further, molecules with reduced branching and compactness showed increased susceptibility to efflux. Given these distinct properties that govern efflux, we developed the first machine learning model, called Susceptibility to Efflux Random Forest (SERF), as a tool to analyze the molecular descriptors of small molecules and predict those that could be susceptible to efflux pumps in silico. Here, SERF demonstrated high accuracy in identifying such molecules. Further, we clustered all 4,500 actives based on their core structures and identified distinct clusters highlighting side chain moieties that cause marked changes in efflux susceptibility. In all, our work reveals a role for physicochemical and structural parameters in governing efflux, presents a machine learning tool for rapid in silico analysis of efflux susceptibility, and provides a proof of principle for the potential of exploiting side chain modification to design novel antimicrobials evading efflux pumps.


2020 ◽  
Vol 9 (3) ◽  
pp. 164-172
Author(s):  
Changsheng Jiang ◽  
Piaopiao Zhao ◽  
Weihua Li ◽  
Yun Tang ◽  
Guixia Liu

Abstract Neurotoxicity is one of the main causes of drug withdrawal, and the biological experimental methods of detecting neurotoxic toxicity are time-consuming and laborious. In addition, the existing computational prediction models of neurotoxicity still have some shortcomings. In response to these shortcomings, we collected a large number of data set of neurotoxicity and used PyBioMed molecular descriptors and eight machine learning algorithms to construct regression prediction models of chemical neurotoxicity. Through the cross-validation and test set validation of the models, it was found that the extra-trees regressor model had the best predictive effect on neurotoxicity (${q}_{\mathrm{test}}^2$ = 0.784). In addition, we get the applicability domain of the models by calculating the standard deviation distance and the lever distance of the training set. We also found that some molecular descriptors are closely related to neurotoxicity by calculating the contribution of the molecular descriptors to the models. Considering the accuracy of the regression models, we recommend using the extra-trees regressor model to predict the chemical autonomic neurotoxicity.


2020 ◽  
Vol 5 (1) ◽  
pp. 317-329 ◽  
Author(s):  
Trent Barnard ◽  
Harry Hagan ◽  
Steven Tseng ◽  
Gabriele C. Sosso

The phenomenal advances of machine learning in the context of drug design have led to the development of a plethora of molecular descriptors. And yet, there might be value in using just a handful of them – inspired by our physical intuition.


2019 ◽  
Vol 19 (5) ◽  
pp. 362-372 ◽  
Author(s):  
Oleg A. Raevsky ◽  
Veniamin Y. Grigorev ◽  
Daniel E. Polianczyk ◽  
Olga E. Raevskaja ◽  
John C. Dearden

Detailed critical analysis of publications devoted to QSPR of aqueous solubility is presented in the review with discussion of four types of aqueous solubility (three different thermodynamic solubilities with unknown solute structure, intrinsic solubility, solubility in physiological media at pH=7.4 and kinetic solubility), variety of molecular descriptors (from topological to quantum chemical), traditional statistical and machine learning methods as well as original QSPR models.


Sign in / Sign up

Export Citation Format

Share Document