Gaussian Moments as Physically Inspired Molecular Descriptors for Accurate and Scalable Machine Learning Potentials

<p><a></a><a>The production process of many active pharmaceutical ingredients such as sitagliptin could cause severe environmental problems due to the use of toxic chemical materials and production infrastructure, energy consumption and wastes treatment. The environmental impacts of sitagliptin production process were estimated with life cycle assessment (LCA) method, which suggested that the use of chemical materials provided the major environmental impacts. Both methods of Eco-indicator 99 and ReCiPe endpoints confirmed that chemical feedstock accounted 83% and 70% of life-cycle impact, respectively. Among all the chemical materials used in the sitagliptin production process, </a><a>trifluoroacetic anhydride </a>was identified as the largest influential factor in most impact categories according to the results of ReCiPe midpoints method. Therefore, high-throughput screening was performed to seek for green chemical substitutes to replace the target chemical (i.e. trifluoroacetic anhydride) by the following three steps. Firstly, thirty most similar chemicals were obtained from two million candidate alternatives in PubChem database based on their molecular descriptors. Thereafter, deep learning neural network models were developed to predict life-cycle impact according to the chemicals in Ecoinvent v3.5 database with known LCA values and corresponding molecular descriptors. Finally, 1,2-ethanediyl ester was proved to be one of the potential greener substitutes after the LCA data of these similar chemicals were predicted using the well-trained machine learning models. The case study demonstrated the applicability of the novel framework to screen green chemical substitutes and optimize the pharmaceutical manufacturing process.</p>

Download Full-text

In silico identification of human pregnane X receptor activators from molecular descriptors by machine learning approaches

Chemometrics and Intelligent Laboratory Systems ◽

10.1016/j.chemolab.2012.05.012 ◽

2012 ◽

Vol 118 ◽

pp. 271-279 ◽

Cited By ~ 6

Author(s):

Hanbing Rao ◽

Yanying Wang ◽

Xianyin Zeng ◽

Xianxiang Wang ◽

Yong Liu ◽

...

Keyword(s):

Machine Learning ◽

In Silico ◽

Molecular Descriptors ◽

Pregnane X Receptor ◽

Learning Approaches ◽

In Silico Identification

Download Full-text

In silicoprediction and screening of γ-secretase inhibitors by molecular descriptors and machine learning methods

Journal of Computational Chemistry ◽

10.1002/jcc.21411 ◽

2009 ◽

pp. NA-NA ◽

Cited By ~ 1

Author(s):

Xue-Gang Yang ◽

Wei Lv ◽

Yu-Zong Chen ◽

Ying Xue

Keyword(s):

Machine Learning ◽

Molecular Descriptors ◽

Learning Methods ◽

Machine Learning Methods

Download Full-text

Predicting Inhibitors of Acetylcholinesterase by Regression and Classification Machine Learning Approaches with Combinations of Molecular Descriptors

Pharmaceutical Research ◽

10.1007/s11095-009-9937-8 ◽

2009 ◽

Vol 26 (9) ◽

pp. 2216-2224 ◽

Cited By ~ 21

Author(s):

Dmitriy Chekmarev ◽

Vladyslav Kholodovych ◽

Sandhya Kortagere ◽

William J. Welsh ◽

Sean Ekins

Keyword(s):

Machine Learning ◽

Molecular Descriptors ◽

Learning Approaches

Download Full-text

Prediction of Alternative Drug-Induced Liver Injury Classifications Using Molecular Descriptors, Gene Expression Perturbation, and Toxicology Reports

Frontiers in Genetics ◽

10.3389/fgene.2021.661075 ◽

2021 ◽

Vol 12 ◽

Author(s):

Wojciech Lesiński ◽

Krzysztof Mnich ◽

Witold R. Rudnicki

Keyword(s):

Gene Expression ◽

Machine Learning ◽

Liver Injury ◽

Cell Lines ◽

Predictive Models ◽

Molecular Descriptors ◽

Chemical Properties ◽

Human Cell Lines ◽

Drug Induced ◽

Drug Induced Liver Injury

Motivation: Drug-induced liver injury (DILI) is one of the primary problems in drug development. Early prediction of DILI, based on the chemical properties of substances and experiments performed on cell lines, would bring a significant reduction in the cost of clinical trials and faster development of drugs. The current study aims to build predictive models of risk of DILI for chemical compounds using multiple sources of information.Methods: Using several supervised machine learning algorithms, we built predictive models for several alternative splits of compounds between DILI and non-DILI classes. To this end, we used chemical properties of the given compounds, their effects on gene expression levels in six human cell lines treated with them, as well as their toxicological profiles. First, we identified the most informative variables in all data sets. Then, these variables were used to build machine learning models. Finally, composite models were built with the Super Learner approach. All modeling was performed using multiple repeats of cross-validation for unbiased and precise estimates of performance.Results: With one exception, gene expression profiles of human cell lines were non-informative and resulted in random models. Toxicological reports were not useful for prediction of DILI. The best results were obtained for models discerning between harmless compounds and those for which any level of DILI was observed (AUC = 0.75). These models were built with Random Forest algorithm that used molecular descriptors.

Download Full-text

In Silico Prediction of siRNA Ionizable-Lipid Nanoparticles in vivo Efficacy: Machine Learning Modeling Based on Formulation and Molecular Descriptors

10.20944/preprints202108.0254.v1 ◽

2021 ◽

Author(s):

Abdelkader A Metwally ◽

Amira A Nayel ◽

Rania M Hathout

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Artificial Neural Networks ◽

In Silico ◽

Molecular Descriptors ◽

Lipid Nanoparticles ◽

Data Set ◽

In Vivo Efficacy ◽

Artificial Neural

In silico prediction of the in vivo efficacy of siRNA ionizable-lipid nanoparticles is desirable yet never achieved before. This study aims to computationally predict siRNA nanoparticles in vivo efficacy, which saves time and resources. A data set containing 120 entries was prepared by combining molecular descriptors of the ionizable lipids together with two nanoparticles formulation characteristics. Input descriptor combinations were selected by an evolutionary algorithm. Artificial neural networks, support vector machines and partial least squares regression were used for QSAR modeling. Depending on how the data set is split, two training sets and two external validation sets were prepared. Training and validation sets contained 90 and 30 entries respectively. The results showed the successful predictions of validation set log(dose) with R2val = 0.86 – 0.89 and 0.75 – 80 for validation sets one and two respectively. Artificial neural networks resulted in the best R2val for both validation sets. For predictions that have high bias, improvement of R2val from 0.47 to 0.96 was achieved by selecting the training set lipids lying within the applicability domain. In conclusion, in vivo performance of siRNA nanoparticles was successfully predicted by combining cheminformatics with machine learning techniques.

Download Full-text

Physicochemical and Structural Parameters Contributing to the Antibacterial Activity and Efflux Susceptibility of Small Molecule Inhibitors of Escherichia coli

Antimicrobial Agents and Chemotherapy ◽

10.1128/aac.01925-20 ◽

2021 ◽

Author(s):

Sara S. El Zahed ◽

Shawn French ◽

Maya A. Farha ◽

Garima Kumar ◽

Eric D. Brown

Keyword(s):

Machine Learning ◽

Escherichia Coli ◽

Antibacterial Activity ◽

Small Molecules ◽

Small Molecule ◽

In Silico ◽

Molecular Descriptors ◽

Structural Parameters ◽

Side Chain ◽

Gram Negative

Discovering new Gram-negative antibiotics has been a challenge for decades. This has been largely attributed to a limited understanding of the molecular descriptors governing Gram-negative permeation and efflux evasion. Herein, we address the contribution of efflux using a novel approach that applies multivariate analysis, machine learning, and structure-based clustering to some 4,500 actives from a small molecule screen in efflux-compromised Escherichia coli. We employed principal-component analysis and trained two decision tree-based machine learning models to investigate descriptors contributing to the antibacterial activity and efflux susceptibility of these actives. This approach revealed that the Gram-negative activity of hydrophobic and planar small molecules with low molecular stability is limited to efflux-compromised E. coli. Further, molecules with reduced branching and compactness showed increased susceptibility to efflux. Given these distinct properties that govern efflux, we developed the first machine learning model, called Susceptibility to Efflux Random Forest (SERF), as a tool to analyze the molecular descriptors of small molecules and predict those that could be susceptible to efflux pumps in silico. Here, SERF demonstrated high accuracy in identifying such molecules. Further, we clustered all 4,500 actives based on their core structures and identified distinct clusters highlighting side chain moieties that cause marked changes in efflux susceptibility. In all, our work reveals a role for physicochemical and structural parameters in governing efflux, presents a machine learning tool for rapid in silico analysis of efflux susceptibility, and provides a proof of principle for the potential of exploiting side chain modification to design novel antimicrobials evading efflux pumps.

Download Full-text

In silico prediction of chemical neurotoxicity using machine learning

Toxicology Research ◽

10.1093/toxres/tfaa016 ◽

2020 ◽

Vol 9 (3) ◽

pp. 164-172

Author(s):

Changsheng Jiang ◽

Piaopiao Zhao ◽

Weihua Li ◽

Yun Tang ◽

Guixia Liu

Keyword(s):

Machine Learning ◽

Regression Models ◽

Cross Validation ◽

Prediction Models ◽

Drug Withdrawal ◽

Molecular Descriptors ◽

Computational Prediction ◽

Machine Learning Algorithms ◽

Training Set ◽

Data Set

Abstract Neurotoxicity is one of the main causes of drug withdrawal, and the biological experimental methods of detecting neurotoxic toxicity are time-consuming and laborious. In addition, the existing computational prediction models of neurotoxicity still have some shortcomings. In response to these shortcomings, we collected a large number of data set of neurotoxicity and used PyBioMed molecular descriptors and eight machine learning algorithms to construct regression prediction models of chemical neurotoxicity. Through the cross-validation and test set validation of the models, it was found that the extra-trees regressor model had the best predictive effect on neurotoxicity (${q}_{\mathrm{test}}^2$ = 0.784). In addition, we get the applicability domain of the models by calculating the standard deviation distance and the lever distance of the training set. We also found that some molecular descriptors are closely related to neurotoxicity by calculating the contribution of the molecular descriptors to the models. Considering the accuracy of the regression models, we recommend using the extra-trees regressor model to predict the chemical autonomic neurotoxicity.

Download Full-text

Less may be more: an informed reflection on molecular descriptors for drug design and discovery

Molecular Systems Design & Engineering ◽

10.1039/c9me00109c ◽

2020 ◽

Vol 5 (1) ◽

pp. 317-329 ◽

Cited By ~ 1

Author(s):

Trent Barnard ◽

Harry Hagan ◽

Steven Tseng ◽

Gabriele C. Sosso

Keyword(s):

Machine Learning ◽

Drug Design ◽

Molecular Descriptors ◽

Physical Intuition

The phenomenal advances of machine learning in the context of drug design have led to the development of a plethora of molecular descriptors. And yet, there might be value in using just a handful of them – inspired by our physical intuition.

Download Full-text

Aqueous Drug Solubility: What Do We Measure, Calculate and QSPR Predict?

Mini-Reviews in Medicinal Chemistry ◽

10.2174/1389557518666180727164417 ◽

2019 ◽

Vol 19 (5) ◽

pp. 362-372 ◽

Cited By ~ 1

Author(s):

Oleg A. Raevsky ◽

Veniamin Y. Grigorev ◽

Daniel E. Polianczyk ◽

Olga E. Raevskaja ◽

John C. Dearden

Keyword(s):

Machine Learning ◽

Quantum Chemical ◽

Critical Analysis ◽

Molecular Descriptors ◽

Aqueous Solubility ◽

Drug Solubility ◽

Learning Methods ◽

Machine Learning Methods ◽

Intrinsic Solubility ◽

Kinetic Solubility

Detailed critical analysis of publications devoted to QSPR of aqueous solubility is presented in the review with discussion of four types of aqueous solubility (three different thermodynamic solubilities with unknown solute structure, intrinsic solubility, solubility in physiological media at pH=7.4 and kinetic solubility), variety of molecular descriptors (from topological to quantum chemical), traditional statistical and machine learning methods as well as original QSPR models.

Download Full-text