Dataset Construction to Explore Chemical Space with 3D Geometry and Deep Learning

Jianing Lu; Song Xia; Jieyu Lu; Yingkai Zhang

doi:10.1021/acs.jcim.1c00007

Diversity oriented Deep Reinforcement Learning for targeted molecule generation

Journal of Cheminformatics ◽

10.1186/s13321-021-00498-z ◽

2021 ◽

Vol 13 (1) ◽

Author(s):

Tiago Pereira ◽

Maryam Abbasi ◽

Bernardete Ribeiro ◽

Joel P. Arrais

Keyword(s):

Neural Networks ◽

Deep Learning ◽

Reinforcement Learning ◽

Deep Neural Networks ◽

Chemical Space ◽

Biological Properties ◽

Training Process ◽

Training Strategy ◽

Inhibitory Power ◽

Exploratory Strategy

AbstractIn this work, we explore the potential of deep learning to streamline the process of identifying new potential drugs through the computational generation of molecules with interesting biological properties. Two deep neural networks compose our targeted generation framework: the Generator, which is trained to learn the building rules of valid molecules employing SMILES strings notation, and the Predictor which evaluates the newly generated compounds by predicting their affinity for the desired target. Then, the Generator is optimized through Reinforcement Learning to produce molecules with bespoken properties. The innovation of this approach is the exploratory strategy applied during the reinforcement training process that seeks to add novelty to the generated compounds. This training strategy employs two Generators interchangeably to sample new SMILES: the initially trained model that will remain fixed and a copy of the previous one that will be updated during the training to uncover the most promising molecules. The evolution of the reward assigned by the Predictor determines how often each one is employed to select the next token of the molecule. This strategy establishes a compromise between the need to acquire more information about the chemical space and the need to sample new molecules, with the experience gained so far. To demonstrate the effectiveness of the method, the Generator is trained to design molecules with an optimized coefficient of partition and also high inhibitory power against the Adenosine $$A_{2A}$$ A 2 A and $$\kappa$$ κ opioid receptors. The results reveal that the model can effectively adjust the newly generated molecules towards the wanted direction. More importantly, it was possible to find promising sets of unique and diverse molecules, which was the main purpose of the newly implemented strategy.

Download Full-text

DEELIG: A Deep Learning Approach to Predict Protein-Ligand Binding Affinity

Bioinformatics and Biology Insights ◽

10.1177/11779322211030364 ◽

2021 ◽

Vol 15 ◽

pp. 117793222110303

Author(s):

Asad Ahmed ◽

Bhavika Mam ◽

Ramanathan Sowdhamini

Keyword(s):

Deep Learning ◽

Ligand Binding ◽

Binding Affinity ◽

Chemical Space ◽

Biological Significance ◽

Protein Crystal ◽

Ligand Docking ◽

Complex Data ◽

Binding Prediction ◽

Data Set

Protein-ligand binding prediction has extensive biological significance. Binding affinity helps in understanding the degree of protein-ligand interactions and is a useful measure in drug design. Protein-ligand docking using virtual screening and molecular dynamic simulations are required to predict the binding affinity of a ligand to its cognate receptor. Performing such analyses to cover the entire chemical space of small molecules requires intense computational power. Recent developments using deep learning have enabled us to make sense of massive amounts of complex data sets where the ability of the model to “learn” intrinsic patterns in a complex plane of data is the strength of the approach. Here, we have incorporated convolutional neural networks to find spatial relationships among data to help us predict affinity of binding of proteins in whole superfamilies toward a diverse set of ligands without the need of a docked pose or complex as user input. The models were trained and validated using a stringent methodology for feature extraction. Our model performs better in comparison to some existing methods used widely and is suitable for predictions on high-resolution protein crystal (⩽2.5 Å) and nonpeptide ligand as individual inputs. Our approach to network construction and training on protein-ligand data set prepared in-house has yielded significant insights. We have also tested DEELIG on few COVID-19 main protease-inhibitor complexes relevant to the current public health scenario. DEELIG-based predictions can be incorporated in existing databases including RSCB PDB, PDBMoad, and PDBbind in filling missing binding affinity data for protein-ligand complexes.

Download Full-text

Beyond Generative Models: Superfast Traversal, Optimization, Novelty, Exploration and Discovery (STONED) Algorithm for Molecules using SELFIES

10.26434/chemrxiv.13383266.v2 ◽

2021 ◽

Author(s):

AkshatKumar Nigam ◽

Robert Pollice ◽

Mario Krenn ◽

Gabriel dos Passos Gomes ◽

Alan Aspuru-Guzik

Keyword(s):

Deep Learning ◽

Virtual Screening ◽

Chemical Space ◽

Generative Models ◽

Inverse Design ◽

Learning Models ◽

Structure Modification ◽

Design Models ◽

Comparable Performance ◽

And Training

Inverse design allows the design of molecules with desirable properties using property optimization. Deep generative models have recently been applied to tackle inverse design, as they possess the ability to optimize molecular properties directly through structure modification using gradients. While the ability to carry out direct property optimizations is promising, the use of generative deep learning models to solve practical problems requires large amounts of data and is very time-consuming. In this work, we propose STONED – a simple and efficient algorithm to perform interpolation and exploration in the chemical space, comparable to deep generative models. STONED bypasses the need for large amounts of data and training times by using string modifications in the SELFIES molecular representation. We achieve comparable performance on typical benchmarks without any training. We demonstrate applications in high-throughput virtual screening for the design of drugs, photovoltaics, and the construction of chemical paths, allowing for both property and structure-based interpolation in the chemical space. We anticipate our results to be a stepping stone for developing more sophisticated inverse design models and benchmarking tools, ultimately helping generative models achieve wide adoption.

Download Full-text

DrugEx v3: Scaffold-Constrained Drug Design with Graph Transformer-based Reinforcement Learning

10.26434/chemrxiv-2021-px6kz ◽

2021 ◽

Author(s):

Xuhan Liu ◽

Kai Ye ◽

Herman W. T. van Vlijmen ◽

Adriaan P. IJzerman ◽

Gerard J. P. van Westen

Keyword(s):

Deep Learning ◽

Reinforcement Learning ◽

Drug Design ◽

De Novo ◽

Chemical Space ◽

Molecular Structures ◽

Rational Drug Design ◽

Graph Representation ◽

General Applicability ◽

De Novo Drug Design

Due to the large drug-like chemical space available to search for feasible drug-like molecules, rational drug design often starts from specific scaffolds to which side chains/substituents are added or modified. With the rapid growth of the application of deep learning in drug discovery, a variety of effective approaches have been developed for de novo drug design. In previous work, we proposed a method named DrugEx, which can be applied in polypharmacology based on multi-objective deep reinforcement learning. However, the previous version is trained under fixed objectives similar to other known methods and does not allow users to input any prior information (i.e. a desired scaffold). In order to improve the general applicability, we updated DrugEx to design drug molecules based on scaffolds which consist of multiple fragments provided by users. In this work, the Transformer model was employed to generate molecular structures. The Transformer is a multi-head self-attention deep learning model containing an encoder to receive scaffolds as input and a decoder to generate molecules as output. In order to deal with the graph representation of molecules we proposed a novel positional encoding for each atom and bond based on an adjacency matrix to extend the architecture of the Transformer. Each molecule was generated by growing and connecting procedures for the fragments in the given scaffold that were unified into one model. Moreover, we trained this generator under a reinforcement learning framework to increase the number of desired ligands. As a proof of concept, our proposed method was applied to design ligands for the adenosine A2A receptor (A2AAR) and compared with SMILES-based methods. The results demonstrated the effectiveness of our method in that 100% of the generated molecules are valid and most of them had a high predicted affinity value towards A2AAR with given scaffolds.

Download Full-text

Sanitize It Yourself: Web-based molecular sanitization for machine-generated chemical structures

10.33774/chemrxiv-2021-9nhm1-v2 ◽

2021 ◽

Author(s):

Naruki Yoshikawa ◽

Kentaro Rikimaru ◽

Kazuki Yamamoto

Keyword(s):

Deep Learning ◽

Web Service ◽

Chemical Space ◽

Computer Aided Drug Design ◽

Web Based ◽

Chemical Structures ◽

Structural Formulas ◽

Computer Aided ◽

Novel Structures ◽

The Web

Many computer-aided drug design (CADD) methods using deep learning have recently been proposed to explore the chemical space toward novel scaffolds efficiently. However, there is a tradeoff between the ease of generating novel structures and the chemical feasibility of structural formulas. To overcome the limitations of computational filtering, we have implemented a web-based software in which users can share and evaluate computer-generated compounds. The web service is available at https://sanitizer.chemical.space/.

Download Full-text

A De Novo Molecular Generation Method Using Latent Vector Based Generative Adversarial Network

10.26434/chemrxiv.8299544.v1 ◽

2019 ◽

Cited By ~ 1

Author(s):

Oleksii Prykhodko ◽

Simon Viet Johansson ◽

Panagiotis-Christos Kotsias ◽

Esben Jannik Bjerrum ◽

Ola Engkvist ◽

...

Keyword(s):

Deep Learning ◽

De Novo ◽

Chemical Space ◽

Learning Method ◽

Training Set ◽

Generative Adversarial Network ◽

Structure Generation ◽

Adversarial Network ◽

Molecule Design ◽

Novel Structures

Recently deep learning method has been used for generating novel structures. In the current study, we proposed a new deep learning method, LatentGAN, which combine an autoencoder and a generative adversarial neural network for doing de novo molecule design. We applied the method for structure generation in two scenarios, one is to generate random drug-like compounds and the other is to generate target biased compounds. Our results show that the method works well in both cases, in which sampled compounds from the trained model can largely occupy the same chemical space of the training set and still a substantial fraction of the generated compound are novel. The distribution of drug-likeness score for compounds sampled from LatentGAN is also similar to that of the training set.

Download Full-text

A De Novo Molecular Generation Method Using Latent Vector Based Generative Adversarial Network

10.26434/chemrxiv.8299544.v3 ◽

2019 ◽

Cited By ~ 1

Author(s):

Oleksii Prykhodko ◽

Simon Viet Johansson ◽

Panagiotis-Christos Kotsias ◽

Josep Arús-Pous ◽

Esben Jannik Bjerrum ◽

...

Keyword(s):

Neural Network ◽

Deep Learning ◽

De Novo ◽

Molecular Design ◽

Chemical Space ◽

Training Set ◽

Generative Adversarial Network ◽

Adversarial Network ◽

De Novo Molecular Design ◽

Novel Structures

Deep learning methods applied to drug discovery have been used to generate novel structures. In this study, we propose a new deep learning architecture, LatentGAN, which combines an autoencoder and a generative adversarial neural network for de novo molecular design. We applied the method in two scenarios: one to generate random drug-like compounds and another to generate target-biased compounds. Our results show that the method works well in both cases: sampled compounds from the trained model can largely occupy the same chemical space as the training set and also generate a substantial fraction of novel compounds. Moreover, the drug-likeness score of compounds sampled from LatentGAN is also similar to that of the training set. Lastly, generated compounds differ from those obtained with a Recurrent Neural Network-based generative model approach, indicating that both methods can be used complementarily.

Download Full-text

Transformer Neural Network-Based Molecular Optimization Using General Transformations

10.26434/chemrxiv-2021-z8rk6 ◽

2021 ◽

Author(s):

Jiazhen He ◽

Eva Nittinger ◽

Christian Tyrchan ◽

Werngard Czechtizky ◽

Atanas Patronov ◽

...

Keyword(s):

Neural Network ◽

Deep Learning ◽

Drug Discovery ◽

Chemical Space ◽

Fundamental Problem ◽

Simultaneous Optimization ◽

Drug Profile ◽

Tanimoto Similarity ◽

Matched Molecular Pairs ◽

Different Types

Molecular optimization aims to improve the drug profile of a starting molecule. It is a fundamental problem in drug discovery but challenging due to (i) the requirement of simultaneous optimization of multiple properties and (ii) the large chemical space to explore. Recently, deep learning methods have been proposed to solve this task by mimicking the chemist's intuition in terms of matched molecular pairs (MMPs). Although MMPs is a typical and widely used strategy by medicinal chemists, it offers limited capability in terms of exploring the space of solutions. There are more options to modify a starting molecule to achieve desirable properties, e.g. one can simultaneously modify the molecule at different places including changing the scaffold. This study trains the same Transformer architecture on different datasets. These datasets consist of a set of molecular pairs which reflect different types of transformations. Beyond MMP transformation, datasets reflecting general transformations are constructed from ChEMBL based on two approaches: Tanimoto similarity (allows for multiple modifications) and scaffold matching (allows for multiple modifications but keep the scaffold constant) respectively. We investigate how the model behavior can be altered by tailoring the dataset while keeping the same model architecture. Our results show that the models trained on differently prepared datasets transform a given starting molecule in a way that it reflects the nature of the dataset used for training the model. These models could complement each other and unlock the capability for the chemists to pursue different options for improving a starting molecule.

Download Full-text

Transformer Neural Network-Based Molecular Optimization Using General Transformations

10.21203/rs.3.rs-1097104/v1 ◽

2021 ◽

Author(s):

Jiazhen He ◽

Eva Nittinger ◽

Christian Tyrchan ◽

Werngard Czechtizky ◽

Atanas Patronov ◽

...

Keyword(s):

Neural Network ◽

Deep Learning ◽

Drug Discovery ◽

Chemical Space ◽

Fundamental Problem ◽

Simultaneous Optimization ◽

Drug Profile ◽

Tanimoto Similarity ◽

Matched Molecular Pairs ◽

Different Types

Abstract Molecular optimization aims to improve the drug profile of a starting molecule. It is a fundamental problem in drug discovery but challenging due to (i) the requirement of simultaneous optimization of multiple properties and (ii) the large chemical space to explore. Recently, deep learning methods have been proposed to solve this task by mimicking the chemist's intuition in terms of matched molecular pairs (MMPs). Although MMPs is a typical and widely used strategy by medicinal chemists, it offers limited capability in terms of exploring the space of solutions. There are more options to modify a starting molecule to achieve desirable properties, e.g. one can simultaneously modify the molecule at different places including changing the scaffold. This study trains the same Transformer architecture on different datasets. These datasets consist of a set of molecular pairs which reflect different types of transformations. Beyond MMP transformation, datasets reflecting general transformations are constructed from ChEMBL based on two approaches: Tanimoto similarity (allows for multiple modifications) and scaffold matching (allows for multiple modifications but keep the scaffold constant) respectively. We investigate how the model behavior can be altered by tailoring the dataset while keeping the same model architecture. Our results show that the models trained on differently prepared datasets transform a given starting molecule in a way that it reflects the nature of the dataset used for training the model. These models could complement each other and unlock the capability for the chemists to pursue different options for improving a starting molecule.

Download Full-text

A Deep-Learning View of Chemical Space Designed to Facilitate Drug Discovery

Journal of Chemical Information and Modeling ◽

10.1021/acs.jcim.0c00321 ◽

2020 ◽

Vol 60 (10) ◽

pp. 4487-4496

Author(s):

Paul Maragakis ◽

Hunter Nisonoff ◽

Brian Cole ◽

David E. Shaw

Keyword(s):

Deep Learning ◽

Drug Discovery ◽

Chemical Space

Download Full-text