Machine learning dihydrogen activation in the chemical space surrounding Vaska's complex

Pascal Friederich; Gabriel dos Passos Gomes; Riccardo De Bin; Alán Aspuru-Guzik; David Balcells

doi:10.1039/d0sc00445f

Machine Learning Reactivity in the Chemical Space Surrounding Vaska's Complex

10.26434/chemrxiv.10347566.v1 ◽

2019 ◽

Author(s):

Pascal Friederich ◽

Gabriel dos Passos Gomes ◽

Riccardo De Bin ◽

Alan Aspuru-Guzik ◽

David Balcells

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Chemical Space ◽

Bayesian Optimization ◽

Gradient Boosting ◽

Learning Models ◽

Hydrogen Activation ◽

Activation Barriers ◽

Machine Learning Models ◽

Vaska's Complex

Machine learning models, including neural networks, Bayesian optimization, gradient boosting and Gaussian processes, were trained with DFT data for the accurate, affordable and explainable prediction of hydrogen activation barriers in the chemical space surrounding Vaska's complex.

Download Full-text

Machine Learning Reactivity in the Chemical Space Surrounding Vaska's Complex

10.26434/chemrxiv.10347566 ◽

2019 ◽

Author(s):

Pascal Friederich ◽

Gabriel dos Passos Gomes ◽

Riccardo De Bin ◽

Alan Aspuru-Guzik ◽

David Balcells

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Chemical Space ◽

Bayesian Optimization ◽

Gradient Boosting ◽

Learning Models ◽

Hydrogen Activation ◽

Activation Barriers ◽

Machine Learning Models ◽

Vaska's Complex

Machine learning models, including neural networks, Bayesian optimization, gradient boosting and Gaussian processes, were trained with DFT data for the accurate, affordable and explainable prediction of hydrogen activation barriers in the chemical space surrounding Vaska's complex.

Download Full-text

Towards a Design of Active Oxygen Evolution Catalysts: Insights from Automated Density Functional Theory Calculations and Machine Learning

10.26434/chemrxiv.7926869 ◽

2019 ◽

Author(s):

Seoin Back ◽

Kevin Tran ◽

Zachary Ulissi

Keyword(s):

Machine Learning ◽

Oxygen Evolution ◽

Active Sites ◽

Density Functional ◽

Chemical Space ◽

Density Functional Theory Calculations ◽

Catalyst Design ◽

Transition Metal Catalysts ◽

Oxide Materials ◽

Design Strategies

<div> <div> <div> <div><p>Developing active and stable oxygen evolution catalysts is a key to enabling various future energy technologies and the state-of-the-art catalyst is Ir-containing oxide materials. Understanding oxygen chemistry on oxide materials is significantly more complicated than studying transition metal catalysts for two reasons: the most stable surface coverage under reaction conditions is extremely important but difficult to understand without many detailed calculations, and there are many possible active sites and configurations on O* or OH* covered surfaces. We have developed an automated and high-throughput approach to solve this problem and predict OER overpotentials for arbitrary oxide surfaces. We demonstrate this for a number of previously-unstudied IrO2 and IrO3 polymorphs and their facets. We discovered that low index surfaces of IrO2 other than rutile (110) are more active than the most stable rutile (110), and we identified promising active sites of IrO2 and IrO3 that outperform rutile (110) by 0.2 V in theoretical overpotential. Based on findings from DFT calculations, we pro- vide catalyst design strategies to improve catalytic activity of Ir based catalysts and demonstrate a machine learning model capable of predicting surface coverages and site activity. This work highlights the importance of investigating unexplored chemical space to design promising catalysts.<br></p></div></div></div></div><div><div><div> </div> </div> </div>

Download Full-text

Applications of Quantitative Structure-Activity Relationships (QSAR) based Virtual Screening in Drug Design: A Review

Mini-Reviews in Medicinal Chemistry ◽

10.2174/1389557520666200429102334 ◽

2020 ◽

Vol 20 (14) ◽

pp. 1375-1388 ◽

Cited By ~ 2

Author(s):

Patnala Ganga Raju Achary

Keyword(s):

Machine Learning ◽

Drug Discovery ◽

Virtual Screening ◽

Model Building ◽

Chemical Space ◽

Qsar Model ◽

Quantitative Structure ◽

Efficient Manner ◽

Qsar Analysis ◽

Structure Activity

The scientists, and the researchers around the globe generate tremendous amount of information everyday; for instance, so far more than 74 million molecules are registered in Chemical Abstract Services. According to a recent study, at present we have around 1060 molecules, which are classified as new drug-like molecules. The library of such molecules is now considered as ‘dark chemical space’ or ‘dark chemistry.’ Now, in order to explore such hidden molecules scientifically, a good number of live and updated databases (protein, cell, tissues, structure, drugs, etc.) are available today. The synchronization of the three different sciences: ‘genomics’, proteomics and ‘in-silico simulation’ will revolutionize the process of drug discovery. The screening of a sizable number of drugs like molecules is a challenge and it must be treated in an efficient manner. Virtual screening (VS) is an important computational tool in the drug discovery process; however, experimental verification of the drugs also equally important for the drug development process. The quantitative structure-activity relationship (QSAR) analysis is one of the machine learning technique, which is extensively used in VS techniques. QSAR is well-known for its high and fast throughput screening with a satisfactory hit rate. The QSAR model building involves (i) chemo-genomics data collection from a database or literature (ii) Calculation of right descriptors from molecular representation (iii) establishing a relationship (model) between biological activity and the selected descriptors (iv) application of QSAR model to predict the biological property for the molecules. All the hits obtained by the VS technique needs to be experimentally verified. The present mini-review highlights: the web-based machine learning tools, the role of QSAR in VS techniques, successful applications of QSAR based VS leading to the drug discovery and advantages and challenges of QSAR.

Download Full-text

Metabolomics-Guided Elucidation of Plant Abiotic Stress Responses in the 4IR Era: An Overview

Metabolites ◽

10.3390/metabo11070445 ◽

2021 ◽

Vol 11 (7) ◽

pp. 445

Author(s):

Morena M. Tinte ◽

Kekeletso H. Chele ◽

Justin J. J. van der Hooft ◽

Fidele Tugizimana

Keyword(s):

Machine Learning ◽

Abiotic Stress ◽

Stress Responses ◽

Abiotic Stresses ◽

Industrial Revolution ◽

Chemical Space ◽

Big Data Analytics ◽

Plant Responses ◽

Next Generation ◽

Computational Tools

Plants are constantly challenged by changing environmental conditions that include abiotic stresses. These are limiting their development and productivity and are subsequently threatening our food security, especially when considering the pressure of the increasing global population. Thus, there is an urgent need for the next generation of crops with high productivity and resilience to climate change. The dawn of a new era characterized by the emergence of fourth industrial revolution (4IR) technologies has redefined the ideological boundaries of research and applications in plant sciences. Recent technological advances and machine learning (ML)-based computational tools and omics data analysis approaches are allowing scientists to derive comprehensive metabolic descriptions and models for the target plant species under specific conditions. Such accurate metabolic descriptions are imperatively essential for devising a roadmap for the next generation of crops that are resilient to environmental deterioration. By synthesizing the recent literature and collating data on metabolomics studies on plant responses to abiotic stresses, in the context of the 4IR era, we point out the opportunities and challenges offered by omics science, analytical intelligence, computational tools and big data analytics. Specifically, we highlight technological advancements in (plant) metabolomics workflows and the use of machine learning and computational tools to decipher the dynamics in the chemical space that define plant responses to abiotic stress conditions.

Download Full-text

Learning chemistry: exploring the suitability of machine learning for the task of structure-based chemical ontology classification

Journal of Cheminformatics ◽

10.1186/s13321-021-00500-8 ◽

2021 ◽

Vol 13 (1) ◽

Cited By ~ 1

Author(s):

Janna Hastings ◽

Martin Glauer ◽

Adel Memariani ◽

Fabian Neuhaus ◽

Till Mossakowski

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Short Term Memory ◽

Chemical Space ◽

Chemical Data ◽

Learning Approaches ◽

Class Prediction ◽

Chemical Structures ◽

Chemical Ontology ◽

Chemical Ontologies

AbstractChemical data is increasingly openly available in databases such as PubChem, which contains approximately 110 million compound entries as of February 2021. With the availability of data at such scale, the burden has shifted to organisation, analysis and interpretation. Chemical ontologies provide structured classifications of chemical entities that can be used for navigation and filtering of the large chemical space. ChEBI is a prominent example of a chemical ontology, widely used in life science contexts. However, ChEBI is manually maintained and as such cannot easily scale to the full scope of public chemical data. There is a need for tools that are able to automatically classify chemical data into chemical ontologies, which can be framed as a hierarchical multi-class classification problem. In this paper we evaluate machine learning approaches for this task, comparing different learning frameworks including logistic regression, decision trees and long short-term memory artificial neural networks, and different encoding approaches for the chemical structures, including cheminformatics fingerprints and character-based encoding from chemical line notation representations. We find that classical learning approaches such as logistic regression perform well with sets of relatively specific, disjoint chemical classes, while the neural network is able to handle larger sets of overlapping classes but needs more examples per class to learn from, and is not able to make a class prediction for every molecule. Future work will explore hybrid and ensemble approaches, as well as alternative network architectures including neuro-symbolic approaches.

Download Full-text

Prioritizing Small Molecule as Candidates for Drug Repositioning using Machine Learning

10.1101/331975 ◽

2018 ◽

Author(s):

Khader Shameer ◽

Kipp W. Johnson ◽

Benjamin S. Glicksberg ◽

Rachel Hodos ◽

Ben Readhead ◽

...

Keyword(s):

Machine Learning ◽

Drug Discovery ◽

Small Molecule ◽

Chemical Space ◽

Drug Repositioning ◽

Chemical Properties ◽

Support Vector ◽

Feature Engineering ◽

Connectivity Map ◽

Molecular Features

ABSTRACTDrug repositioning, i.e. identifying new uses for existing drugs and research compounds, is a cost-effective drug discovery strategy that is continuing to grow in popularity. Prioritizing and identifying drugs capable of being repositioned may improve the productivity and success rate of the drug discovery cycle, especially if the drug has already proven to be safe in humans. In previous work, we have shown that drugs that have been successfully repositioned have different chemical properties than those that have not. Hence, there is an opportunity to use machine learning to prioritize drug-like molecules as candidates for future repositioning studies. We have developed a feature engineering and machine learning that leverages data from publicly available drug discovery resources: RepurposeDB and DrugBank. ChemVec is the chemoinformatics-based feature engineering strategy designed to compile molecular features representing the chemical space of all drug molecules in the study. ChemVec was trained through a variety of supervised classification algorithms (Naïve Bayes, Random Forest, Support Vector Machines and an ensemble model combining the three algorithms). Models were created using various combinations of datasets as Connectivity Map based model, DrugBank Approved compounds based model, and DrugBank full set of compounds; of which RandomForest trained using Connectivity Map based data performed the best (AUC=0.674). Briefly, our study represents a novel approach to evaluate a small molecule for drug repositioning opportunity and may further improve discovery of pleiotropic drugs, or those to treat multiple indications.

Download Full-text

Controlled exploration of chemical space by machine learning of coarse-grained representations

Physical Review E ◽

10.1103/physreve.100.033302 ◽

2019 ◽

Vol 100 (3) ◽

Cited By ~ 5

Author(s):

Christian Hoffmann ◽

Roberto Menichetti ◽

Kiran H. Kanekal ◽

Tristan Bereau

Keyword(s):

Machine Learning ◽

Chemical Space ◽

Coarse Grained

Download Full-text

A graph-based genetic algorithm and generative model/Monte Carlo tree search for the exploration of chemical space

Chemical Science ◽

10.1039/c8sc05372c ◽

2019 ◽

Vol 10 (12) ◽

pp. 3567-3572 ◽

Cited By ~ 24

Author(s):

Jan H. Jensen

Keyword(s):

Machine Learning ◽

Genetic Algorithm ◽

Monte Carlo ◽

Chemical Space ◽

Generative Model ◽

Tree Search ◽

Monte Carlo Tree Search ◽

Synthetic Accessibility ◽

Better Than

This paper presents a comparison of a graph-based genetic algorithm (GB-GA) and machine learning (ML) results for the optimization of log P values with a constraint for synthetic accessibility and shows that the GA is as good as or better than the ML approaches for this particular property.

Download Full-text

Resolving Transition Metal Chemical Space: Feature Selection for Machine Learning and Structure–Property Relationships

The Journal of Physical Chemistry A ◽

10.1021/acs.jpca.7b08750 ◽

2017 ◽

Vol 121 (46) ◽

pp. 8939-8954 ◽

Cited By ~ 61

Author(s):

Jon Paul Janet ◽

Heather J. Kulik

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Transition Metal ◽

Chemical Space ◽

Structure Property ◽

Structure Property Relationships ◽

Selection For

Download Full-text