scholarly journals Learning continuous and data-driven molecular descriptors by translating equivalent chemical representations

2019 ◽  
Vol 10 (6) ◽  
pp. 1692-1701 ◽  
Author(s):  
Robin Winter ◽  
Floriane Montanari ◽  
Frank Noé ◽  
Djork-Arné Clevert

Translation between semantically equivalent but syntactically different line notations of molecular structures compresses meaningful information into a continuous molecular descriptor.

2019 ◽  
Vol 19 (11) ◽  
pp. 944-956 ◽  
Author(s):  
Oscar Martínez-Santiago ◽  
Yovani Marrero-Ponce ◽  
Ricardo Vivas-Reyes ◽  
Mauricio E.O. Ugarriza ◽  
Elízabeth Hurtado-Rodríguez ◽  
...  

Background: Recently, some authors have defined new molecular descriptors (MDs) based on the use of the Graph Discrete Derivative, known as Graph Derivative Indices (GDI). This new approach about discrete derivatives over various elements from a graph takes as outset the formation of subgraphs. Previously, these definitions were extended into the chemical context (N-tuples) and interpreted in structural/physicalchemical terms as well as applied into the description of several endpoints, with good results. Objective: A generalization of GDIs using the definitions of Higher Order and Mixed Derivative for molecular graphs is proposed as a generalization of the previous works, allowing the generation of a new family of MDs. Methods: An extension of the previously defined GDIs is presented, and for this purpose, the concept of Higher Order Derivatives and Mixed Derivatives is introduced. These novel approaches to obtaining MDs based on the concepts of discrete derivatives (finite difference) of the molecular graphs use the elements of the hypermatrices conceived from 12 different ways (12 events) of fragmenting the molecular structures. The result of applying the higher order and mixed GDIs over any molecular structure allows finding Local Vertex Invariants (LOVIs) for atom-pairs, for atoms-pairs-pairs and so on. All new families of GDIs are implemented in a computational software denominated DIVATI (acronym for Discrete DeriVAtive Type Indices), a module of KeysFinder Framework in TOMOCOMD-CARDD system. Results: QSAR modeling of the biological activity (Log 1/K) of 31 steroids reveals that the GDIs obtained using the higher order and mixed GDIs approaches yield slightly higher performance compared to previously reported approaches based on the duplex, triplex and quadruplex matrix. In fact, the statistical parameters for models obtained with the higher-order and mixed GDI method are superior to those reported in the literature by using other 0-3D QSAR methods. Conclusion: It can be suggested that the higher-order and mixed GDIs, appear as a promissory tool in QSAR/QSPRs, similarity/dissimilarity analysis and virtual screening studies.


2021 ◽  
Author(s):  
Adarsh Kalikadien ◽  
Evgeny A. Pidko ◽  
Vivek Sinha

<div>Local chemical space exploration of an experimentally synthesized material can be done by making slight structural</div><div>variations of the synthesized material. This generation of many molecular structures with reasonable quality,</div><div>that resemble an existing (chemical) purposeful material, is needed for high-throughput screening purposes in</div><div>material design. Large databases of geometry and chemical properties of transition metal complexes are not</div><div>readily available, although these complexes are widely used in homogeneous catalysis. A Python-based workflow,</div><div>ChemSpaX, that is aimed at automating local chemical space exploration for any type of molecule, is introduced.</div><div>The overall computational workflow of ChemSpaX is explained in more detail. ChemSpaX uses 3D information,</div><div>to place functional groups on an input structure. For example, the input structure can be a catalyst for which one</div><div>wants to use high-throughput screening to investigate if the catalytic activity can be improved. The newly placed</div><div>substituents are optimized using a computationally cheap force-field optimization method. After placement of</div><div>new substituents, higher level optimizations using xTB or DFT instead of force-field optimization are also possible</div><div>in the current workflow. In representative applications of ChemSpaX, it is shown that the structures generated by</div><div>ChemSpaX have a reasonable quality for usage in high-throughput screening applications. Representative applications</div><div>of ChemSpaX are shown by investigating various adducts on functionalized Mn-based pincer complexes,</div><div>hydrogenation of Ru-based pincer complexes, functionalization of cobalt porphyrin complexes and functionalization</div><div>of a bipyridyl functionalized cobalt-porphyrin trapped in a M2L4 type cage complex. Descriptors such as</div><div>the Gibbs free energy of reaction and HOMO-LUMO gap, that can be used in data-driven design and discovery</div><div>of catalysts, were selected and studied in more detail for the selected use cases. The relatively fast GFN2-xTB</div><div>method was used to calculate these descriptors and a comparison was done against DFT calculated descriptors.</div><div>ChemSpaX is open-source and aims to bolster the efforts of the scientific community towards data-driven material</div><div>discovery.</div>


Processes ◽  
2019 ◽  
Vol 7 (7) ◽  
pp. 433 ◽  
Author(s):  
Jialin Zheng ◽  
Zahid Iqbal ◽  
Asfand Fahad ◽  
Asim Zafar ◽  
Adnan Aslam ◽  
...  

Topological indices have been computed for various molecular structures over many years. These are numerical invariants associated with molecular structures and are helpful in featuring many properties. Among these molecular descriptors, the eccentricity connectivity index has a dynamic role due to its ability of estimating pharmaceutical properties. In this article, eccentric connectivity, total eccentricity connectivity, augmented eccentric connectivity, first Zagreb eccentricity, modified eccentric connectivity, second Zagreb eccentricity, and the edge version of eccentric connectivity indices, are computed for the molecular graph of a PolyEThyleneAmidoAmine (PETAA) dendrimer. Moreover, the explicit representations of the polynomials associated with some of these indices are also computed.


2021 ◽  
Author(s):  
Adarsh Kalikadien ◽  
Evgeny A. Pidko ◽  
Vivek Sinha

<div>Local chemical space exploration of an experimentally synthesized material can be done by making slight structural</div><div>variations of the synthesized material. This generation of many molecular structures with reasonable quality,</div><div>that resemble an existing (chemical) purposeful material, is needed for high-throughput screening purposes in</div><div>material design. Large databases of geometry and chemical properties of transition metal complexes are not</div><div>readily available, although these complexes are widely used in homogeneous catalysis. A Python-based workflow,</div><div>ChemSpaX, that is aimed at automating local chemical space exploration for any type of molecule, is introduced.</div><div>The overall computational workflow of ChemSpaX is explained in more detail. ChemSpaX uses 3D information,</div><div>to place functional groups on an input structure. For example, the input structure can be a catalyst for which one</div><div>wants to use high-throughput screening to investigate if the catalytic activity can be improved. The newly placed</div><div>substituents are optimized using a computationally cheap force-field optimization method. After placement of</div><div>new substituents, higher level optimizations using xTB or DFT instead of force-field optimization are also possible</div><div>in the current workflow. In representative applications of ChemSpaX, it is shown that the structures generated by</div><div>ChemSpaX have a reasonable quality for usage in high-throughput screening applications. Representative applications</div><div>of ChemSpaX are shown by investigating various adducts on functionalized Mn-based pincer complexes,</div><div>hydrogenation of Ru-based pincer complexes, functionalization of cobalt porphyrin complexes and functionalization</div><div>of a bipyridyl functionalized cobalt-porphyrin trapped in a M2L4 type cage complex. Descriptors such as</div><div>the Gibbs free energy of reaction and HOMO-LUMO gap, that can be used in data-driven design and discovery</div><div>of catalysts, were selected and studied in more detail for the selected use cases. The relatively fast GFN2-xTB</div><div>method was used to calculate these descriptors and a comparison was done against DFT calculated descriptors.</div><div>ChemSpaX is open-source and aims to bolster the efforts of the scientific community towards data-driven material</div><div>discovery.</div>


2021 ◽  
Author(s):  
Hongyu Chen ◽  
Shigeru Yamaguchi ◽  
Yuya Morita ◽  
Hiroyasu Nakao ◽  
Xiangning Zhai ◽  
...  

Asymmetric catalysis enabling divergent control of multiple stereocenters remains challenging in synthetic organic chemistry. While machine learning-based optimization of molecular catalysis is an emerging approach, data-driven catalyst design to achieve stereodivergent asymmetric synthesis producing multiple reaction outcomes, such as constitutional selectivity, diastereoselectivity, and enantioselectivity, is unprecedented. Here, we report the straightforward identification of asymmetric two-component iridium/boron hybrid catalyst systems for α-C-allylation of carboxylic acids. Structural optimization of the chiral ligands for iridium catalysts was driven by molecular field-based regression analysis with a dataset containing overall 32 molecular structures. The catalyst systems enabled selective access to all the possible isomers of chiral carboxylic acids bearing contiguous stereocenters. This stereodivergent asymmetric catalysis is applicable to late-stage structural modifications of drugs and their derivatives.


2021 ◽  
Author(s):  
Hongyu Chen ◽  
Shigeru Yamaguchi ◽  
Yuya Morita ◽  
Hiroyasu Nakao ◽  
Xiangning Zhai ◽  
...  

Asymmetric catalysis enabling divergent control of multiple stereocenters remains challenging in synthetic organic chemistry. While machine learning-based optimization of molecular catalysis is an emerging approach, data-driven catalyst design to achieve stereodivergent asymmetric synthesis producing multiple reaction outcomes, such as constitutional selectivity, diastereoselectivity, and enantioselectivity, is unprecedented. Here, we report the straightforward identification of asymmetric two-component iridium/boron hybrid catalyst systems for α-C-allylation of carboxylic acids. Structural optimization of the chiral ligands for iridium catalysts was driven by molecular field-based regression analysis with a dataset containing overall 32 molecular structures. The catalyst systems enabled selective access to all the possible isomers of chiral carboxylic acids bearing contiguous stereocenters. This stereodivergent asymmetric catalysis is applicable to late-stage structural modifications of drugs and their derivatives.


2021 ◽  
Author(s):  
Adarsh Kalikadien ◽  
Evgeny A. Pidko ◽  
Vivek Sinha

<div>Local chemical space exploration of an experimentally synthesized material can be done by making slight structural</div><div>variations of the synthesized material. This generation of many molecular structures with reasonable quality,</div><div>that resemble an existing (chemical) purposeful material, is needed for high-throughput screening purposes in</div><div>material design. Large databases of geometry and chemical properties of transition metal complexes are not</div><div>readily available, although these complexes are widely used in homogeneous catalysis. A Python-based workflow,</div><div>ChemSpaX, that is aimed at automating local chemical space exploration for any type of molecule, is introduced.</div><div>The overall computational workflow of ChemSpaX is explained in more detail. ChemSpaX uses 3D information,</div><div>to place functional groups on an input structure. For example, the input structure can be a catalyst for which one</div><div>wants to use high-throughput screening to investigate if the catalytic activity can be improved. The newly placed</div><div>substituents are optimized using a computationally cheap force-field optimization method. After placement of</div><div>new substituents, higher level optimizations using xTB or DFT instead of force-field optimization are also possible</div><div>in the current workflow. In representative applications of ChemSpaX, it is shown that the structures generated by</div><div>ChemSpaX have a reasonable quality for usage in high-throughput screening applications. Representative applications</div><div>of ChemSpaX are shown by investigating various adducts on functionalized Mn-based pincer complexes,</div><div>hydrogenation of Ru-based pincer complexes, functionalization of cobalt porphyrin complexes and functionalization</div><div>of a bipyridyl functionalized cobalt-porphyrin trapped in a M2L4 type cage complex. Descriptors such as</div><div>the Gibbs free energy of reaction and HOMO-LUMO gap, that can be used in data-driven design and discovery</div><div>of catalysts, were selected and studied in more detail for the selected use cases. The relatively fast GFN2-xTB</div><div>method was used to calculate these descriptors and a comparison was done against DFT calculated descriptors.</div><div>ChemSpaX is open-source and aims to bolster the efforts of the scientific community towards data-driven material</div><div>discovery.</div>


2021 ◽  
Vol 22 (15) ◽  
pp. 8073
Author(s):  
Keerthana Jaganathan ◽  
Hilal Tayara ◽  
Kil To Chong

Drug-induced liver toxicity is one of the significant safety challenges for the patient’s health and the pharmaceutical industry. It causes termination of drug candidates in clinical trials and also the retractions of approved drugs from the market. Thus, it is essential to identify hepatotoxic compounds in the initial stages of drug development process. The purpose of this study is to construct quantitative structure activity relationship models using machine learning algorithms and systematical feature selection methods for molecular descriptor sets. The models were built from a large and diverse set of 1253 drug compounds and were validated internally with 10-fold cross-validation. In this study, we applied a variety of feature selection techniques to extract the optimal subset of descriptors as modeling features to improve the prediction performance. Experimental results suggested that the support vector machine-based classifier had achieved a better classification accuracy with reduced molecular descriptors. The final optimal model provides an accuracy of 0.811, a sensitivity of 0.840, a specificity of 0.783 and Mathew’s correlation coefficient of 0.623 with an internal validation set. Furthermore, this model outperformed the prior studies while evaluated in both the internal and external test sets. The utilization of distinct optimal molecular descriptors as modeling features produce an in silico model with a superior performance.


2020 ◽  
Vol 2020 ◽  
pp. 1-13
Author(s):  
Jianxin Wei ◽  
Muhammad Imran ◽  
Muhamamd Azhar Iqbal ◽  
Muhammad Asad Zaighum

There are various methods available which are used to search large chemical databases and to predict the physicochemical properties of molecular structures. Using molecular descriptors for this purpose is the simplest of these methods. The Zagreb indices are amongst the oldest molecular descriptors, and their properties have been extensively studied and applied in QSAR/QSPR studies. The Zagreb coindices were recently introduced, attracting the attention of researchers in mathematical chemistry. In this paper, we study Zagreb indices and several other Zagreb-type indices including the general Randić index, sum-connectivity index, F-index, and Zagreb coindices of R-vertex and edge join of two arbitrary graphs.


2021 ◽  
Author(s):  
Yuhong Wang ◽  
Sam Michael ◽  
Ruili Huang ◽  
Jinghua Zhao ◽  
Katlin Recabo ◽  
...  

To generate drug molecules of desired properties with computational methods is the holy grail in pharmaceutical research. Here we describe an AI strategy, retro drug design, or RDD, to generate novel small molecule drugs from scratch to meet predefined requirements, including but not limited to biological activity against a drug target, and optimal range of physicochemical and ADMET properties. Traditional predictive models were first trained over experimental data for the target properties, using an atom typing based molecular descriptor system, ATP. Monte Carlo sampling algorithm was then utilized to find the solutions in the ATP space defined by the target properties, and the deep learning model of Seq2Seq was employed to decode molecular structures from the solutions. To test feasibility of the algorithm, we challenged RDD to generate novel drugs that can activate μ opioid receptor (MOR) and penetrate blood brain barrier (BBB). Starting from vectors of random numbers, RDD generated 180,000 chemical structures, of which 78% were chemically valid. About 42,000 (31%) of the valid structures fell into the property space defined by MOR activity and BBB permeability. Out of the 42,000 structures, only 267 chemicals were commercially available, indicating a high extent of novelty of the AI-generated compounds. We purchased and assayed 96 compounds, and 25 of which were found to be MOR agonists. These compounds also have excellent BBB scores. The results presented in this paper illustrate that RDD has potential to revolutionize the current drug discovery process and create novel structures with multiple desired properties, including biological functions and ADMET properties. Availability of an AI-enabled fast track in drug discovery is essential to cope with emergent public health threat, such as pandemic of COVID-19.


Sign in / Sign up

Export Citation Format

Share Document