Learning continuous and data-driven molecular descriptors by translating equivalent chemical representations

Robin Winter; Floriane Montanari; Frank Noé; Djork-Arné Clevert

doi:10.1039/c8sc04175j

Higher-Order and Mixed Discrete Derivatives such as a Novel Graph- Theoretical Invariant for Generating New Molecular Descriptors

Current Topics in Medicinal Chemistry ◽

10.2174/1568026619666190510093651 ◽

2019 ◽

Vol 19 (11) ◽

pp. 944-956 ◽

Cited By ~ 2

Author(s):

Oscar Martínez-Santiago ◽

Yovani Marrero-Ponce ◽

Ricardo Vivas-Reyes ◽

Mauricio E.O. Ugarriza ◽

Elízabeth Hurtado-Rodríguez ◽

...

Keyword(s):

Molecular Descriptors ◽

3D Qsar ◽

Higher Order ◽

Molecular Structures ◽

Qsar Modeling ◽

Statistical Parameters ◽

Molecular Graphs ◽

New Family ◽

Mixed Derivatives ◽

Discrete Derivative

Background: Recently, some authors have defined new molecular descriptors (MDs) based on the use of the Graph Discrete Derivative, known as Graph Derivative Indices (GDI). This new approach about discrete derivatives over various elements from a graph takes as outset the formation of subgraphs. Previously, these definitions were extended into the chemical context (N-tuples) and interpreted in structural/physicalchemical terms as well as applied into the description of several endpoints, with good results. Objective: A generalization of GDIs using the definitions of Higher Order and Mixed Derivative for molecular graphs is proposed as a generalization of the previous works, allowing the generation of a new family of MDs. Methods: An extension of the previously defined GDIs is presented, and for this purpose, the concept of Higher Order Derivatives and Mixed Derivatives is introduced. These novel approaches to obtaining MDs based on the concepts of discrete derivatives (finite difference) of the molecular graphs use the elements of the hypermatrices conceived from 12 different ways (12 events) of fragmenting the molecular structures. The result of applying the higher order and mixed GDIs over any molecular structure allows finding Local Vertex Invariants (LOVIs) for atom-pairs, for atoms-pairs-pairs and so on. All new families of GDIs are implemented in a computational software denominated DIVATI (acronym for Discrete DeriVAtive Type Indices), a module of KeysFinder Framework in TOMOCOMD-CARDD system. Results: QSAR modeling of the biological activity (Log 1/K) of 31 steroids reveals that the GDIs obtained using the higher order and mixed GDIs approaches yield slightly higher performance compared to previously reported approaches based on the duplex, triplex and quadruplex matrix. In fact, the statistical parameters for models obtained with the higher-order and mixed GDI method are superior to those reported in the literature by using other 0-3D QSAR methods. Conclusion: It can be suggested that the higher-order and mixed GDIs, appear as a promissory tool in QSAR/QSPRs, similarity/dissimilarity analysis and virtual screening studies.

Download Full-text

ChemSpaX: Exploration of Chemical Space by Automated Functionalization of Molecular Scaffold

10.26434/chemrxiv.14617320 ◽

2021 ◽

Author(s):

Adarsh Kalikadien ◽

Evgeny A. Pidko ◽

Vivek Sinha

Keyword(s):

Force Field ◽

High Throughput ◽

High Throughput Screening ◽

Chemical Space ◽

Space Exploration ◽

Molecular Structures ◽

Data Driven ◽

Pincer Complexes ◽

Cobalt Porphyrin ◽

Input Structure

<div>Local chemical space exploration of an experimentally synthesized material can be done by making slight structural</div><div>variations of the synthesized material. This generation of many molecular structures with reasonable quality,</div><div>that resemble an existing (chemical) purposeful material, is needed for high-throughput screening purposes in</div><div>material design. Large databases of geometry and chemical properties of transition metal complexes are not</div><div>readily available, although these complexes are widely used in homogeneous catalysis. A Python-based workflow,</div><div>ChemSpaX, that is aimed at automating local chemical space exploration for any type of molecule, is introduced.</div><div>The overall computational workflow of ChemSpaX is explained in more detail. ChemSpaX uses 3D information,</div><div>to place functional groups on an input structure. For example, the input structure can be a catalyst for which one</div><div>wants to use high-throughput screening to investigate if the catalytic activity can be improved. The newly placed</div><div>substituents are optimized using a computationally cheap force-field optimization method. After placement of</div><div>new substituents, higher level optimizations using xTB or DFT instead of force-field optimization are also possible</div><div>in the current workflow. In representative applications of ChemSpaX, it is shown that the structures generated by</div><div>ChemSpaX have a reasonable quality for usage in high-throughput screening applications. Representative applications</div><div>of ChemSpaX are shown by investigating various adducts on functionalized Mn-based pincer complexes,</div><div>hydrogenation of Ru-based pincer complexes, functionalization of cobalt porphyrin complexes and functionalization</div><div>of a bipyridyl functionalized cobalt-porphyrin trapped in a M2L4 type cage complex. Descriptors such as</div><div>the Gibbs free energy of reaction and HOMO-LUMO gap, that can be used in data-driven design and discovery</div><div>of catalysts, were selected and studied in more detail for the selected use cases. The relatively fast GFN2-xTB</div><div>method was used to calculate these descriptors and a comparison was done against DFT calculated descriptors.</div><div>ChemSpaX is open-source and aims to bolster the efforts of the scientific community towards data-driven material</div><div>discovery.</div>

Download Full-text

Some Eccentricity-Based Topological Indices and Polynomials of Poly(EThyleneAmidoAmine) (PETAA) Dendrimers

Processes ◽

10.3390/pr7070433 ◽

2019 ◽

Vol 7 (7) ◽

pp. 433 ◽

Cited By ~ 9

Author(s):

Jialin Zheng ◽

Zahid Iqbal ◽

Asfand Fahad ◽

Asim Zafar ◽

Adnan Aslam ◽

...

Keyword(s):

Molecular Descriptors ◽

Molecular Graph ◽

Topological Indices ◽

Molecular Structures ◽

Connectivity Index ◽

Connectivity Indices ◽

Numerical Invariants ◽

Explicit Representations ◽

Pharmaceutical Properties

Topological indices have been computed for various molecular structures over many years. These are numerical invariants associated with molecular structures and are helpful in featuring many properties. Among these molecular descriptors, the eccentricity connectivity index has a dynamic role due to its ability of estimating pharmaceutical properties. In this article, eccentric connectivity, total eccentricity connectivity, augmented eccentric connectivity, first Zagreb eccentricity, modified eccentric connectivity, second Zagreb eccentricity, and the edge version of eccentric connectivity indices, are computed for the molecular graph of a PolyEThyleneAmidoAmine (PETAA) dendrimer. Moreover, the explicit representations of the polynomials associated with some of these indices are also computed.

Download Full-text

ChemSpaX: Exploration of Chemical Space by Automated Functionalization of Molecular Scaffold

10.26434/chemrxiv.14617320.v1 ◽

2021 ◽

Author(s):

Adarsh Kalikadien ◽

Evgeny A. Pidko ◽

Vivek Sinha

Keyword(s):

Force Field ◽

High Throughput ◽

High Throughput Screening ◽

Chemical Space ◽

Space Exploration ◽

Molecular Structures ◽

Data Driven ◽

Pincer Complexes ◽

Cobalt Porphyrin ◽

Input Structure

<div>Local chemical space exploration of an experimentally synthesized material can be done by making slight structural</div><div>variations of the synthesized material. This generation of many molecular structures with reasonable quality,</div><div>that resemble an existing (chemical) purposeful material, is needed for high-throughput screening purposes in</div><div>material design. Large databases of geometry and chemical properties of transition metal complexes are not</div><div>readily available, although these complexes are widely used in homogeneous catalysis. A Python-based workflow,</div><div>ChemSpaX, that is aimed at automating local chemical space exploration for any type of molecule, is introduced.</div><div>The overall computational workflow of ChemSpaX is explained in more detail. ChemSpaX uses 3D information,</div><div>to place functional groups on an input structure. For example, the input structure can be a catalyst for which one</div><div>wants to use high-throughput screening to investigate if the catalytic activity can be improved. The newly placed</div><div>substituents are optimized using a computationally cheap force-field optimization method. After placement of</div><div>new substituents, higher level optimizations using xTB or DFT instead of force-field optimization are also possible</div><div>in the current workflow. In representative applications of ChemSpaX, it is shown that the structures generated by</div><div>ChemSpaX have a reasonable quality for usage in high-throughput screening applications. Representative applications</div><div>of ChemSpaX are shown by investigating various adducts on functionalized Mn-based pincer complexes,</div><div>hydrogenation of Ru-based pincer complexes, functionalization of cobalt porphyrin complexes and functionalization</div><div>of a bipyridyl functionalized cobalt-porphyrin trapped in a M2L4 type cage complex. Descriptors such as</div><div>the Gibbs free energy of reaction and HOMO-LUMO gap, that can be used in data-driven design and discovery</div><div>of catalysts, were selected and studied in more detail for the selected use cases. The relatively fast GFN2-xTB</div><div>method was used to calculate these descriptors and a comparison was done against DFT calculated descriptors.</div><div>ChemSpaX is open-source and aims to bolster the efforts of the scientific community towards data-driven material</div><div>discovery.</div>

Download Full-text

Data-Driven Catalyst Optimization for Stereodivergent Asymmetric Synthesis of α-Allyl Carboxylic Acids by Iridium/boron Hybrid Catalysis

10.26434/chemrxiv.14579169.v1 ◽

2021 ◽

Author(s):

Hongyu Chen ◽

Shigeru Yamaguchi ◽

Yuya Morita ◽

Hiroyasu Nakao ◽

Xiangning Zhai ◽

...

Keyword(s):

Asymmetric Synthesis ◽

Carboxylic Acids ◽

Asymmetric Catalysis ◽

Catalyst Design ◽

Molecular Structures ◽

Data Driven ◽

Iridium Catalysts ◽

Structural Modifications ◽

Molecular Catalysis ◽

Catalyst Systems

Asymmetric catalysis enabling divergent control of multiple stereocenters remains challenging in synthetic organic chemistry. While machine learning-based optimization of molecular catalysis is an emerging approach, data-driven catalyst design to achieve stereodivergent asymmetric synthesis producing multiple reaction outcomes, such as constitutional selectivity, diastereoselectivity, and enantioselectivity, is unprecedented. Here, we report the straightforward identification of asymmetric two-component iridium/boron hybrid catalyst systems for α-C-allylation of carboxylic acids. Structural optimization of the chiral ligands for iridium catalysts was driven by molecular field-based regression analysis with a dataset containing overall 32 molecular structures. The catalyst systems enabled selective access to all the possible isomers of chiral carboxylic acids bearing contiguous stereocenters. This stereodivergent asymmetric catalysis is applicable to late-stage structural modifications of drugs and their derivatives.

Download Full-text

Data-Driven Catalyst Optimization for Stereodivergent Asymmetric Synthesis of α-Allyl Carboxylic Acids by Iridium/boron Hybrid Catalysis

10.26434/chemrxiv.14579169 ◽

2021 ◽

Author(s):

Hongyu Chen ◽

Shigeru Yamaguchi ◽

Yuya Morita ◽

Hiroyasu Nakao ◽

Xiangning Zhai ◽

...

Keyword(s):

Asymmetric Synthesis ◽

Carboxylic Acids ◽

Asymmetric Catalysis ◽

Catalyst Design ◽

Molecular Structures ◽

Data Driven ◽

Iridium Catalysts ◽

Structural Modifications ◽

Molecular Catalysis ◽

Catalyst Systems

Asymmetric catalysis enabling divergent control of multiple stereocenters remains challenging in synthetic organic chemistry. While machine learning-based optimization of molecular catalysis is an emerging approach, data-driven catalyst design to achieve stereodivergent asymmetric synthesis producing multiple reaction outcomes, such as constitutional selectivity, diastereoselectivity, and enantioselectivity, is unprecedented. Here, we report the straightforward identification of asymmetric two-component iridium/boron hybrid catalyst systems for α-C-allylation of carboxylic acids. Structural optimization of the chiral ligands for iridium catalysts was driven by molecular field-based regression analysis with a dataset containing overall 32 molecular structures. The catalyst systems enabled selective access to all the possible isomers of chiral carboxylic acids bearing contiguous stereocenters. This stereodivergent asymmetric catalysis is applicable to late-stage structural modifications of drugs and their derivatives.

Download Full-text

ChemSpaX: Exploration of Chemical Space by Automated Functionalization of Molecular Scaffold

10.26434/chemrxiv.14617320.v2 ◽

2021 ◽

Author(s):

Adarsh Kalikadien ◽

Evgeny A. Pidko ◽

Vivek Sinha

Keyword(s):

Force Field ◽

High Throughput ◽

High Throughput Screening ◽

Chemical Space ◽

Space Exploration ◽

Molecular Structures ◽

Data Driven ◽

Pincer Complexes ◽

Cobalt Porphyrin ◽

Input Structure

<div>Local chemical space exploration of an experimentally synthesized material can be done by making slight structural</div><div>variations of the synthesized material. This generation of many molecular structures with reasonable quality,</div><div>that resemble an existing (chemical) purposeful material, is needed for high-throughput screening purposes in</div><div>material design. Large databases of geometry and chemical properties of transition metal complexes are not</div><div>readily available, although these complexes are widely used in homogeneous catalysis. A Python-based workflow,</div><div>ChemSpaX, that is aimed at automating local chemical space exploration for any type of molecule, is introduced.</div><div>The overall computational workflow of ChemSpaX is explained in more detail. ChemSpaX uses 3D information,</div><div>to place functional groups on an input structure. For example, the input structure can be a catalyst for which one</div><div>wants to use high-throughput screening to investigate if the catalytic activity can be improved. The newly placed</div><div>substituents are optimized using a computationally cheap force-field optimization method. After placement of</div><div>new substituents, higher level optimizations using xTB or DFT instead of force-field optimization are also possible</div><div>in the current workflow. In representative applications of ChemSpaX, it is shown that the structures generated by</div><div>ChemSpaX have a reasonable quality for usage in high-throughput screening applications. Representative applications</div><div>of ChemSpaX are shown by investigating various adducts on functionalized Mn-based pincer complexes,</div><div>hydrogenation of Ru-based pincer complexes, functionalization of cobalt porphyrin complexes and functionalization</div><div>of a bipyridyl functionalized cobalt-porphyrin trapped in a M2L4 type cage complex. Descriptors such as</div><div>the Gibbs free energy of reaction and HOMO-LUMO gap, that can be used in data-driven design and discovery</div><div>of catalysts, were selected and studied in more detail for the selected use cases. The relatively fast GFN2-xTB</div><div>method was used to calculate these descriptors and a comparison was done against DFT calculated descriptors.</div><div>ChemSpaX is open-source and aims to bolster the efforts of the scientific community towards data-driven material</div><div>discovery.</div>

Download Full-text

Prediction of Drug-Induced Liver Toxicity Using SVM and Optimal Descriptor Sets

International Journal of Molecular Sciences ◽

10.3390/ijms22158073 ◽

2021 ◽

Vol 22 (15) ◽

pp. 8073

Author(s):

Keerthana Jaganathan ◽

Hilal Tayara ◽

Kil To Chong

Keyword(s):

Feature Selection ◽

Molecular Descriptors ◽

Liver Toxicity ◽

Molecular Descriptor ◽

Quantitative Structure Activity Relationship ◽

Machine Learning Algorithms ◽

Superior Performance ◽

Support Vector ◽

Drug Induced ◽

Internal Validation

Drug-induced liver toxicity is one of the significant safety challenges for the patient’s health and the pharmaceutical industry. It causes termination of drug candidates in clinical trials and also the retractions of approved drugs from the market. Thus, it is essential to identify hepatotoxic compounds in the initial stages of drug development process. The purpose of this study is to construct quantitative structure activity relationship models using machine learning algorithms and systematical feature selection methods for molecular descriptor sets. The models were built from a large and diverse set of 1253 drug compounds and were validated internally with 10-fold cross-validation. In this study, we applied a variety of feature selection techniques to extract the optimal subset of descriptors as modeling features to improve the prediction performance. Experimental results suggested that the support vector machine-based classifier had achieved a better classification accuracy with reduced molecular descriptors. The final optimal model provides an accuracy of 0.811, a sensitivity of 0.840, a specificity of 0.783 and Mathew’s correlation coefficient of 0.623 with an internal validation set. Furthermore, this model outperformed the prior studies while evaluated in both the internal and external test sets. The utilization of distinct optimal molecular descriptors as modeling features produce an in silico model with a superior performance.

Download Full-text

Zagreb-Type Indices of R-Vertex Join and R-Edge Join of Graphs

Journal of Chemistry ◽

10.1155/2020/9767128 ◽

2020 ◽

Vol 2020 ◽

pp. 1-13

Author(s):

Jianxin Wei ◽

Muhammad Imran ◽

Muhamamd Azhar Iqbal ◽

Muhammad Asad Zaighum

Keyword(s):

Physicochemical Properties ◽

Molecular Descriptors ◽

Molecular Structures ◽

Connectivity Index ◽

Chemical Databases ◽

Randic Index ◽

Zagreb Indices ◽

Mathematical Chemistry ◽

Join Of Graphs ◽

Arbitrary Graphs

There are various methods available which are used to search large chemical databases and to predict the physicochemical properties of molecular structures. Using molecular descriptors for this purpose is the simplest of these methods. The Zagreb indices are amongst the oldest molecular descriptors, and their properties have been extensively studied and applied in QSAR/QSPR studies. The Zagreb coindices were recently introduced, attracting the attention of researchers in mathematical chemistry. In this paper, we study Zagreb indices and several other Zagreb-type indices including the general Randić index, sum-connectivity index, F-index, and Zagreb coindices of R-vertex and edge join of two arbitrary graphs.

Download Full-text

Retro Drug Design: From Target Properties to Molecular Structures

10.1101/2021.05.11.442656 ◽

2021 ◽

Author(s):

Yuhong Wang ◽

Sam Michael ◽

Ruili Huang ◽

Jinghua Zhao ◽

Katlin Recabo ◽

...

Keyword(s):

Drug Discovery ◽

Drug Design ◽

Molecular Descriptor ◽

Pharmaceutical Research ◽

Monte Carlo Sampling ◽

Molecular Structures ◽

Descriptor System ◽

Chemical Structures ◽

Drug Molecules ◽

Property Space

To generate drug molecules of desired properties with computational methods is the holy grail in pharmaceutical research. Here we describe an AI strategy, retro drug design, or RDD, to generate novel small molecule drugs from scratch to meet predefined requirements, including but not limited to biological activity against a drug target, and optimal range of physicochemical and ADMET properties. Traditional predictive models were first trained over experimental data for the target properties, using an atom typing based molecular descriptor system, ATP. Monte Carlo sampling algorithm was then utilized to find the solutions in the ATP space defined by the target properties, and the deep learning model of Seq2Seq was employed to decode molecular structures from the solutions. To test feasibility of the algorithm, we challenged RDD to generate novel drugs that can activate μ opioid receptor (MOR) and penetrate blood brain barrier (BBB). Starting from vectors of random numbers, RDD generated 180,000 chemical structures, of which 78% were chemically valid. About 42,000 (31%) of the valid structures fell into the property space defined by MOR activity and BBB permeability. Out of the 42,000 structures, only 267 chemicals were commercially available, indicating a high extent of novelty of the AI-generated compounds. We purchased and assayed 96 compounds, and 25 of which were found to be MOR agonists. These compounds also have excellent BBB scores. The results presented in this paper illustrate that RDD has potential to revolutionize the current drug discovery process and create novel structures with multiple desired properties, including biological functions and ADMET properties. Availability of an AI-enabled fast track in drug discovery is essential to cope with emergent public health threat, such as pandemic of COVID-19.

Download Full-text