scholarly journals Assigning the Origin of Microbial Natural Products by Chemical Space Map and Machine Learning

Biomolecules ◽  
2020 ◽  
Vol 10 (10) ◽  
pp. 1385
Author(s):  
Alice Capecchi ◽  
Jean-Louis Reymond

Microbial natural products (NPs) are an important source of drugs, however, their structural diversity remains poorly understood. Here we used our recently reported MinHashed Atom Pair fingerprint with diameter of four bonds (MAP4), a fingerprint suitable for molecules across very different sizes, to analyze the Natural Products Atlas (NPAtlas), a database of 25,523 NPs of bacterial or fungal origin. To visualize NPAtlas by MAP4 similarity, we used the dimensionality reduction method tree map (TMAP). The resulting interactive map organizes molecules by physico-chemical properties and compound families such as peptides and glycosides. Remarkably, the map separates bacterial and fungal NPs from one another, revealing that these two compound families are intrinsically different despite their related biosynthetic pathways. We used these differences to train a machine learning model capable of distinguishing between NPs of bacterial or fungal origin.

2020 ◽  
Author(s):  
Alice Capecchi ◽  
Jean-Louis Reymond

<p>Microbial natural products (NPs) are an important source of drugs. However, their structural diversity remains poorly understood. Here we used our recently reported MinHashed Atom Pair fingerprint with diameter of four bonds (MAP4), a fingerprint suitable for molecules across very different sizes, to analyze the Natural Products Atlas (NPAtlas), a database of 25,523 NPs of bacterial or fungal origin downloaded from <a href="https://www.npatlas.org/joomla/">https://www.npatlas.org/joomla/</a>. To visualize NPAtlas by MAP4 similarity, we used the dimensionality reduction method tree map (TMAP) (<a href="http://tmap.gdb.tools/">http://tmap.gdb.tools</a>). The resulting interactive map (<a href="https://tm.gdb.tools/map4/npatlas_map_tmap/">https://tm.gdb.tools/map4/npatlas_map_tmap/</a>) organizes molecules by physico-chemical properties and compound families such as peptides, glycosides, polyphenols or terpenoids. Remarkably, the map separates bacterial and fungal NPs from one another, revealing that these two compound families are intrinsically different despite of their related biosynthetic pathways. We used these differences to train a machine learning model capable of distinguishing between NPs of bacterial or fungal origin. </p>


2020 ◽  
Author(s):  
Alice Capecchi ◽  
Jean-Louis Reymond

<p>Microbial natural products (NPs) are an important source of drugs. However, their structural diversity remains poorly understood. Here we used our recently reported MinHashed Atom Pair fingerprint with diameter of four bonds (MAP4), a fingerprint suitable for molecules across very different sizes, to analyze the Natural Products Atlas (NPAtlas), a database of 25,523 NPs of bacterial or fungal origin downloaded from <a href="https://www.npatlas.org/joomla/">https://www.npatlas.org/joomla/</a>. To visualize NPAtlas by MAP4 similarity, we used the dimensionality reduction method tree map (TMAP) (<a href="http://tmap.gdb.tools/">http://tmap.gdb.tools</a>). The resulting interactive map (<a href="https://tm.gdb.tools/map4/npatlas_map_tmap/">https://tm.gdb.tools/map4/npatlas_map_tmap/</a>) organizes molecules by physico-chemical properties and compound families such as peptides, glycosides, polyphenols or terpenoids. Remarkably, the map separates bacterial and fungal NPs from one another, revealing that these two compound families are intrinsically different despite of their related biosynthetic pathways. We used these differences to train a machine learning model capable of distinguishing between NPs of bacterial or fungal origin. </p>


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Hossein Khabbaz ◽  
Mohammad Hossein Karimi-Jafari ◽  
Ali Akbar Saboury ◽  
Bagher BabaAli

Abstract Background Antimicrobial peptides are promising tools to fight against ever-growing antibiotic resistance. However, despite many advantages, their toxicity to mammalian cells is a critical obstacle in clinical application and needs to be addressed. Results In this study, by using an up-to-date dataset, a machine learning model has been trained successfully to predict the toxicity of antimicrobial peptides. The comprehensive set of features of both physico-chemical and linguistic-based with local and global essences have undergone feature selection to identify key properties behind toxicity of antimicrobial peptides. After feature selection, the hybrid model showed the best performance with a recall of 0. 876 and a F1 score of 0. 849. Conclusions The obtained model can be useful in extracting AMPs with low toxicity from AMP libraries in clinical applications. On the other hand, several properties with local nature including positions of strand forming and hydrophobic residues in final selected features show that these properties are critical definer of peptide properties and should be considered in developing models for activity prediction of peptides. The executable code is available at https://git.io/JRZaT.


2021 ◽  
Author(s):  
Jiawang Liu ◽  
Anan Liu ◽  
Youcai Hu

Cytochrome P450s, laccases, and intermolecular [4 + 2] cyclases, along with other enzymes were utilized to catalyze varied dimerization of matured natural products so as to create the structural diversity and complexity in microorganisms.


2018 ◽  
Author(s):  
Khader Shameer ◽  
Kipp W. Johnson ◽  
Benjamin S. Glicksberg ◽  
Rachel Hodos ◽  
Ben Readhead ◽  
...  

ABSTRACTDrug repositioning, i.e. identifying new uses for existing drugs and research compounds, is a cost-effective drug discovery strategy that is continuing to grow in popularity. Prioritizing and identifying drugs capable of being repositioned may improve the productivity and success rate of the drug discovery cycle, especially if the drug has already proven to be safe in humans. In previous work, we have shown that drugs that have been successfully repositioned have different chemical properties than those that have not. Hence, there is an opportunity to use machine learning to prioritize drug-like molecules as candidates for future repositioning studies. We have developed a feature engineering and machine learning that leverages data from publicly available drug discovery resources: RepurposeDB and DrugBank. ChemVec is the chemoinformatics-based feature engineering strategy designed to compile molecular features representing the chemical space of all drug molecules in the study. ChemVec was trained through a variety of supervised classification algorithms (Naïve Bayes, Random Forest, Support Vector Machines and an ensemble model combining the three algorithms). Models were created using various combinations of datasets as Connectivity Map based model, DrugBank Approved compounds based model, and DrugBank full set of compounds; of which RandomForest trained using Connectivity Map based data performed the best (AUC=0.674). Briefly, our study represents a novel approach to evaluate a small molecule for drug repositioning opportunity and may further improve discovery of pleiotropic drugs, or those to treat multiple indications.


Biomolecules ◽  
2019 ◽  
Vol 9 (1) ◽  
pp. 31 ◽  
Author(s):  
B. Pilón-Jiménez ◽  
Fernanda Saldívar-González ◽  
Bárbara Díaz-Eufracio ◽  
José Medina-Franco

Compound databases of natural products have a major impact on drug discovery projects and other areas of research. The number of databases in the public domain with compounds with natural origins is increasing. Several countries, Brazil, France, Panama and, recently, Vietnam, have initiatives in place to construct and maintain compound databases that are representative of their diversity. In this proof-of-concept study, we discuss the first version of BIOFACQUIM, a novel compound database with natural products isolated and characterized in Mexico. We discuss its construction, curation, and a complete chemoinformatic characterization of the content and coverage in chemical space. The profile of physicochemical properties, scaffold content, and diversity, as well as structural diversity based on molecular fingerprints is reported. BIOFACQUIM is available for free.


2020 ◽  
Author(s):  
Suhad A.A. Al-Salihi ◽  
Ian Bull ◽  
Raghad A. Al-Salhi ◽  
Paul J. Gates ◽  
Kifah Salih ◽  
...  

AbstractThere is a desperate need in continuing the search for natural products with novel mechanism to battle the constant increase of microbial drug resistance. Previously mushroom forming fungi were neglected as a source of novel antibiotics, due to the difficulties associated with their culture preparation and genetic tractability. However, modern fungal molecular and synthetic biology tools, renewed the interest in exploring mushroom fungi for novel therapeutics. The aim of this study was to have a comprehensive picture of nine basidiomycetes secondary metabolites (SM), screen their biological and chemical properties to describe the genetic pathways associated with their production. H. fasciculare revealed to be highly active antagonistic species, with antimicrobial activity against three different microorganisms - Bacillus subtilis, Escherichia coli and Saccharomyces cerevisiae-. Extensive genomic comparison and chemical analysis using analytical chromatography, led to the characterisation of more than 15 variant biosynthetic gene clusters and the first identification of a potent antibacterial metabolite-3, 5-dichloromethoxy benzoic acid (3, 5-D)-in this species, for which a biosynthetic gene cluster was predicted. This work demonstrates the great potential of mushroom forming fungi as a reservoir of bioactive natural products which are currently unexplored, and that access to their genomic data and structural diversity natural products via utilizing modern computational analysis and efficient chemical methods, could accelerate the development and applications of such distinct molecules in both pharmaceutical and agrochemical industry.


2018 ◽  
Vol 20 (47) ◽  
pp. 29661-29668 ◽  
Author(s):  
Michael J. Willatt ◽  
Félix Musil ◽  
Michele Ceriotti

By representing elements as points in a low-dimensional chemical space it is possible to improve the performance of a machine-learning model for a chemically-diverse dataset. The resulting coordinates are reminiscent of the main groups of the periodic table.


2019 ◽  
Vol 10 (1) ◽  
Author(s):  
K. T. Schütt ◽  
M. Gastegger ◽  
A. Tkatchenko ◽  
K.-R. Müller ◽  
R. J. Maurer

AbstractMachine learning advances chemistry and materials science by enabling large-scale exploration of chemical space based on quantum chemical calculations. While these models supply fast and accurate predictions of atomistic chemical properties, they do not explicitly capture the electronic degrees of freedom of a molecule, which limits their applicability for reactive chemistry and chemical analysis. Here we present a deep learning framework for the prediction of the quantum mechanical wavefunction in a local basis of atomic orbitals from which all other ground-state properties can be derived. This approach retains full access to the electronic structure via the wavefunction at force-field-like efficiency and captures quantum mechanics in an analytically differentiable representation. On several examples, we demonstrate that this opens promising avenues to perform inverse design of molecular structures for targeting electronic property optimisation and a clear path towards increased synergy of machine learning and quantum chemistry.


2022 ◽  
Author(s):  
tao zeng ◽  
B. Andes Hess ◽  
fan zhang ◽  
ruibo wu

Many computational methods are used to expand the open-ended border of chemical spaces. Natural products and their derivatives are an important source for drug discovery, and some algorithms are devoted to rapidly generating pseudo-natural products, while their accessibility and chemical interpretation were often ignored or underestimated, thus hampering experimental synthesis in practice. Herein, a bio-inspired strategy (named TeroGen) is proposed, in which the cyclization and decoration stage of terpenoid biosynthesis were mimicked by meta-dynamics simulations and deep learning models respectively, to explore their chemical space. In the protocol of TeroGen, the synthetic accessibility is validated by reaction energetics (reaction barrier and reaction heat) based on the GFN2-xTB methods. Chemical interpretation is an intrinsic feature as the reaction pathway is bioinspired and triggered by the RMSD-PP method in conjunction with an encoder-decoder architecture. This is quite distinct from conventional library/fragment-based or rule-based strategies, by using TeroGen, new reaction routes are feasibly explored to increase the structural diversity. For example, only a rather limited number of sesterterpenoids in our training set is included in this work, but our TeroGen would predict more than 30000 sesterterpenoids and map out the reaction network with super efficiency, ten times as many as the known sesterterpenoids (less than 2500). In sum, TeroGen not only greatly expands the chemical space of terpenoids but also provides various plausible biosynthetic pathways, which are crucial clues for heterologous biosynthesis, bio-mimic and chemical synthesis of complicated terpenoids.


Sign in / Sign up

Export Citation Format

Share Document