PROTACs and Building Blocks: The 2D Chemical Space in Very Early Drug Discovery

Giuseppe Ermondi; Diego Garcia-Jimenez; Giulia Caron

doi:10.3390/molecules26030672

PROTACs and Building Blocks: The 2D Chemical Space in Very Early Drug Discovery

Molecules ◽

10.3390/molecules26030672 ◽

2021 ◽

Vol 26 (3) ◽

pp. 672

Author(s):

Giuseppe Ermondi ◽

Diego Garcia-Jimenez ◽

Giulia Caron

Keyword(s):

Ad Hoc ◽

Chemical Space ◽

Flexible Structures ◽

Building Blocks ◽

Correlation Matrices ◽

Chemical Structures ◽

Box Plots ◽

Physicochemical Descriptors ◽

Degradation Activity ◽

Approved Drugs

Targeted protein degradation by PROTACs has emerged as a new modality for the knockdown of a range of proteins, and, more recently, it has become increasingly clear that the PROTAC chemical space requires characterization through a pool of ad hoc physicochemical descriptors. In this study, a new database named PROTAC-DB that provides extensive information about PROTACs and building blocks was used to obtain the 2D chemical structures of about 1600 PROTACs, 60 E3 ligands, 800 linkers, and 202 warheads. For every structure, we calculated a pool of seven 2D descriptors carefully identified as informative for large and flexible structures. For comparison purposes, the same procedure was applied to a dataset of about 50 bRo5 approved drugs reported in the literature. Correlation matrices, PCAs, box plots, and other graphical tools were used to define and understand the chemical space covered by PROTACs and building blocks in relation to other compounds. Results show that linkers have different properties than E3 ligands and warheads. Polar descriptors additivity is not respected when passing from building blocks to degraders. Moreover, a very preliminary analysis based on three PROTACs with high, intermediate, and low permeability showed how the most permeable compounds seem to occupy a region closer to bRo5 drugs and, thus, exhibit different properties than impermeable compounds. Finally, a second database, PROTACpedia, was used to discuss the relevance of physicochemical descriptors on degradation activity.

Download Full-text

Are we ready to design oral PROTACs®?

ADMET & DMPK ◽

10.5599/admet.1037 ◽

2021 ◽

Author(s):

Diego Garcia Jimenez ◽

Matteo Rossi Sebastiano ◽

Giulia Caron ◽

Giuseppe Ermondi

Keyword(s):

Open Access ◽

Ad Hoc ◽

Statistical Study ◽

Chemical Space ◽

E3 Ligases ◽

Creative Commons ◽

Chemical Structures ◽

Set Up ◽

Approved Drugs ◽

Open Access Article

PROTACs® are expected to strongly impact the future of drug discovery. Therefore, in this work we firstly performed a statistical study to highlight the distribution of E3 ligases and POIs collected in PROTAC-DB, the main online database focused on degraders. Moreover, since the emerging technology of protein degradation deals with large and complex chemical structures, the second part of the paper focuses on how to set up a property-based design strategy to obtain oral degraders. For this purpose, we calculated a pool of seven previously ad hoc selected 2D descriptors for the 2258 publicly available degraders in PROTAC-DB (average values: MW= 972.9 Da, nC= 49.5, NAR= 4.5, PHI= 17.3, nHDon= 4.5, nHAcc= 17.7 and TPSA= 240 Å2) and compared them to a dataset of 50 bRo5 orally approved drugs. Then, a chemical space based on nC, PHI and TPSA was built and subregions with optimal permeability and bioavailability were identified. Bioavailable degraders (ARV-110 and ARV-471) tend to be closer to the Ro5 region, using mainly semi-rigid linkers. Permeable degraders, on the other hand, are placed in an average central region of the chemical space but chameleonicity could allow them to be located closer to the two Arvinas compounds. ©2021 by the authors. This article is an open-access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/4.0/). ©2021 by the authors. This article is an open-access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/4.0/).

Download Full-text

Deciphering complex metabolite mixtures by unsupervised and supervised substructure discovery and semi-automated annotation from MS/MS spectra

10.1101/491506 ◽

2018 ◽

Author(s):

Simon Rogers ◽

Cher Wei Ong ◽

Joe Wandy ◽

Madeleine Ernst ◽

Lars Ridder ◽

...

Keyword(s):

Machine Learning ◽

Mass Spectrometry ◽

Structural Information ◽

Chemical Space ◽

Building Blocks ◽

Complex Mixtures ◽

Machine Learning Classification ◽

Chemical Structures ◽

Automated Annotation ◽

Classification Prediction

Complex metabolite mixtures are challenging to unravel. Mass spectrometry (MS) is a widely used and sensitive technique to obtain structural information on complex mixtures. However, just knowing the molecular masses of the mixture's constituents is almost always insufficient for confident assignment of the associated chemical structures. Structural information can be augmented through MS fragmentation experiments whereby detected metabolites are fragmented giving rise to MS/MS spectra. However, how can we maximize the structural information we gain from fragmentation spectra? We recently proposed a substructure-based strategy to enhance metabolite annotation for complex mixtures by considering metabolites as the sum of (bio)chemically relevant moieties that we can detect through mass spectrometry fragmentation approaches. Our MS2LDA tool allows us to discover - unsupervised - groups of mass fragments and/or neutral losses termed Mass2Motifs that often correspond to substructures. After manual annotation, these Mass2Motifs can be used in subsequent MS2LDA analyses of new datasets, thereby providing structural annotations for many molecules that are not present in spectral databases. Here, we describe how additional strategies, taking advantage of i) combinatorial in-silico matching of experimental mass features to substructures of candidate molecules, and ii) automated machine learning classification of molecules, can facilitate semi-automated annotation of substructures. We show how our approach accelerates the Mass2Motif annotation process and therefore broadens the chemical space spanned by characterized motifs. Our machine learning model used to classify fragmentation spectra learns the relationships between fragment spectra and chemical features. Classification prediction on these features can be aggregated for all molecules that contribute to a particular Mass2Motif and guide Mass2Motif annotations. To make annotated Mass2Motifs available to the community, we also present motifDB: an open database of Mass2Motifs that can be browsed and accessed programmatically through an API. MotifDB is integrated within ms2lda.org, allowing users to efficiently search for characterized motifs in their own experiments. We expect that with an increasing number of Mass2Motif annotations available through a growing database we can more quickly gain insight in the constituents of complex mixtures. That will allow prioritization towards novel or unexpected chemistries and faster recognition of known biochemical building blocks.

Download Full-text

Language, procedures, and the non-perceptual origin of natural number concepts

10.31234/osf.io/3q4mf ◽

2016 ◽

Author(s):

David Barner

Keyword(s):

General Framework ◽

Ad Hoc ◽

Building Blocks ◽

Number Words ◽

Linguistic Markers ◽

Logical Foundations ◽

Large Numbers ◽

Perceptual Representations ◽

Positive Integers ◽

Perceptual Systems

Perceptual representations – e.g., of objects or approximate magnitudes –are often invoked as building blocks that children combine with linguisticsymbols when they acquire the positive integers. Systems of numericalperception are either assumed to contain the logical foundations ofarithmetic innately, or to supply the basis for their induction. Here Ipropose an alternative to this general framework, and argue that theintegers are not learned from perceptual systems, but instead arise toexplain perception as part of language acquisition. Drawing oncross-linguistic data and developmental data, I show that small numbers(1-4) and large numbers (~5+) arise both historically and in individualchildren via entirely distinct mechanisms, constituting independentlearning problems, neither of which begins with perceptual building blocks.Specifically, I propose that children begin by learning small numbers(i.e., *one, two, three*) using the same logical resources that supportother linguistic markers of number (e.g., singular, plural). Several yearslater, children discover the logic of counting by inferring the logicalrelations between larger number words from their roles in blind countingprocedures, and only incidentally associate number words with perception ofapproximate magnitudes, in an *ad hoc* and highly malleable fashion.Counting provides a form of explanation for perception but is not causallyderived from perceptual systems.

Download Full-text

Learning chemistry: exploring the suitability of machine learning for the task of structure-based chemical ontology classification

Journal of Cheminformatics ◽

10.1186/s13321-021-00500-8 ◽

2021 ◽

Vol 13 (1) ◽

Cited By ~ 1

Author(s):

Janna Hastings ◽

Martin Glauer ◽

Adel Memariani ◽

Fabian Neuhaus ◽

Till Mossakowski

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Short Term Memory ◽

Chemical Space ◽

Chemical Data ◽

Learning Approaches ◽

Class Prediction ◽

Chemical Structures ◽

Chemical Ontology ◽

Chemical Ontologies

AbstractChemical data is increasingly openly available in databases such as PubChem, which contains approximately 110 million compound entries as of February 2021. With the availability of data at such scale, the burden has shifted to organisation, analysis and interpretation. Chemical ontologies provide structured classifications of chemical entities that can be used for navigation and filtering of the large chemical space. ChEBI is a prominent example of a chemical ontology, widely used in life science contexts. However, ChEBI is manually maintained and as such cannot easily scale to the full scope of public chemical data. There is a need for tools that are able to automatically classify chemical data into chemical ontologies, which can be framed as a hierarchical multi-class classification problem. In this paper we evaluate machine learning approaches for this task, comparing different learning frameworks including logistic regression, decision trees and long short-term memory artificial neural networks, and different encoding approaches for the chemical structures, including cheminformatics fingerprints and character-based encoding from chemical line notation representations. We find that classical learning approaches such as logistic regression perform well with sets of relatively specific, disjoint chemical classes, while the neural network is able to handle larger sets of overlapping classes but needs more examples per class to learn from, and is not able to make a class prediction for every molecule. Future work will explore hybrid and ensemble approaches, as well as alternative network architectures including neuro-symbolic approaches.

Download Full-text

Identifying home locations in human mobility data: an open-source R package for comparison and reproducibility

10.31235/osf.io/k3jp2 ◽

2021 ◽

Author(s):

Qingqing Chen ◽

Ate Poorthuis

Keyword(s):

Software Package ◽

Ad Hoc ◽

Human Mobility ◽

Building Blocks ◽

R Package ◽

Location Based Services ◽

R Software ◽

Mobility Data ◽

Residential Population ◽

Research Goal

Identifying meaningful locations, such as home or work, from human mobility data has become an increasingly common prerequisite for geographic research. Although location-based services (LBS) and other mobile technology have rapidly grown in recent years, it can be challenging to infer meaningful places from such data, which - compared to conventional datasets – can be devoid of context. Existing approaches are often developed ad-hoc and can lack transparency and reproducibility. To address this, we introduce an R software package for inferring home locations from LBS data. The package implements pre-existing algorithms and provides building blocks to make writing algorithmic ‘recipes’ more convenient. We evaluate this approach by analyzing a de-identified LBS dataset from Singapore that aims to balance ethics and privacy with the research goal of identifying meaningful locations. We show that ensemble approaches, combining multiple algorithms, can be especially valuable in this regard as the resulting patterns of inferred home locations closely correlate with the distribution of residential population. We hope this package, and others like it, will contribute to an increase in use and sharing of comparable algorithms, research code and data. This will increase transparency and reproducibility in mobility analyses and further the ongoing discourse around ethical big data research.

Download Full-text

Analysis of a large food chemical database: chemical space, diversity, and complexity

F1000Research ◽

10.12688/f1000research.15440.2 ◽

2018 ◽

Vol 7 ◽

pp. 993 ◽

Cited By ~ 14

Author(s):

J. Jesús Naveja ◽

Mariel P. Rico-Hidalgo ◽

José L. Medina-Franco

Keyword(s):

Natural Products ◽

Chemical Space ◽

Structural Complexity ◽

Chemical Diversity ◽

Chemical Structures ◽

Novel Approach ◽

Chemical Database ◽

Protein Interactome ◽

Limited Basis ◽

Future Direction

Background: Food chemicals are a cornerstone in the food industry. However, its chemical diversity has been explored on a limited basis, for instance, previous analysis of food-related databases were done up to 2,200 molecules. The goal of this work was to quantify the chemical diversity of chemical compounds stored in FooDB, a database with nearly 24,000 food chemicals. Methods: The visual representation of the chemical space of FooDB was done with ChemMaps, a novel approach based on the concept of chemical satellites. The large food chemical database was profiled based on physicochemical properties, molecular complexity and scaffold content. The global diversity of FooDB was characterized using Consensus Diversity Plots. Results: It was found that compounds in FooDB are very diverse in terms of properties and structure, with a large structural complexity. It was also found that one third of the food chemicals are acyclic molecules and ring-containing molecules are mostly monocyclic, with several scaffolds common to natural products in other databases. Conclusions: To the best of our knowledge, this is the first analysis of the chemical diversity and complexity of FooDB. This study represents a step further to the emerging field of “Food Informatics”. Future study should compare directly the chemical structures of the molecules in FooDB with other compound databases, for instance, drug-like databases and natural products collections. An additional future direction of this work is to use the list of 3,228 polyphenolic compounds identified in this work to enhance the on-going polyphenol-protein interactome studies.

Download Full-text

MeFSAT: A curated natural product database specific to secondary metabolites of medicinal fungi

10.1101/2020.12.04.412502 ◽

2020 ◽

Author(s):

R.P. Vivek-Ananth ◽

Ajaya Kumar Sahoo ◽

Kavyaa Kumaravel ◽

Karthikeyan Mohanraj ◽

Areejit Samal

Keyword(s):

Secondary Metabolites ◽

Physicochemical Properties ◽

Natural Product ◽

Chemical Space ◽

Chemical Structures ◽

Therapeutic Uses ◽

Natural Product Library ◽

Medicinal Fungi ◽

Similarity Networks ◽

Shape Complexity

AbstractFungi are a rich source of secondary metabolites which constitutes a valuable and diverse chemical space of natural products. Medicinal fungi have been used in traditional medicine to treat human ailments for centuries. To date, there is no devoted resource on secondary metabolites and therapeutic uses of medicinal fungi. Such a dedicated resource compiling dispersed information on medicinal fungi across published literature will facilitate ongoing efforts towards natural product based drug discovery. Here, we present the first comprehensive manually curated database on Medicinal Fungi Secondary metabolites And Therapeutics (MeFSAT) that compiles information on 184 medicinal fungi, 1830 secondary metabolites and 149 therapeutics uses. Importantly, MeFSAT contains a non-redundant in silico natural product library of 1830 secondary metabolites along with information on their chemical structures, computed physicochemical properties, drug-likeness properties, predicted ADMET properties, molecular descriptors and predicted human target proteins. By comparing the physicochemical properties of secondary metabolites in MeFSAT with other small molecules collections, we find that fungal secondary metabolites have high stereochemical complexity and shape complexity similar to other natural product libraries. Based on multiple scoring schemes, we have filtered a subset of 228 drug-like secondary metabolites in MeFSAT database. By constructing and analyzing chemical similarity networks, we show that the chemical space of secondary metabolites in MeFSAT is highly diverse. The compiled information in MeFSAT database is openly accessible at: https://cb.imsc.res.in/mefsat/.

Download Full-text

The Brazilian Compound Library (BraCoLi) Database: A Brazilian Repository of Chemical and Biological Information for Drug Design

10.26434/chemrxiv.14569356.v1 ◽

2021 ◽

Author(s):

Gabriel Corrêa Veríssimo ◽

Valtair Severino dos Santos Junior ◽

Ingrid Ariela do Rosário de Almeida ◽

Marina Sant'Anna Mitraud Ruas ◽

Lukas Galuppo Coutinho ◽

...

Keyword(s):

Drug Design ◽

Chemical Space ◽

Principal Component ◽

Chemical Diversity ◽

Biological Information ◽

Virtual Library ◽

Compound Library ◽

Computer Aided Drug Design ◽

Approved Drugs ◽

Fda Approved Drugs

The Brazilian Compound Library (BraCoLi) is a novel virtual library of manually curated compounds developed by Brazilian research groups to support further computer-aided drug design works. Herein, the first version of the database is described comprising 1,176 compounds. Also, the chemical diversity and drug-like profile of BraCoLi were defined to analyze its chemical space. A significant amount of the compounds fitted Lipinski and Veber’s rules, alongside other drug-likeness properties. Principal component analysis showed that BraCoLi is similar to other databases (FDA-approved drugs and NuBBEDB) regarding structural and physicochemical patterns. Finally, a scaffold analysis showed that BraCoLi presents several privileged chemical skeletons with great diversity.

Download Full-text

Combinatorial drug discovery in nanoliter droplets

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1802233115 ◽

2018 ◽

Vol 115 (26) ◽

pp. 6685-6690 ◽

Cited By ~ 18

Author(s):

Anthony Kulesa ◽

Jared Kehe ◽

Juan E. Hurtado ◽

Prianca Tawde ◽

Paul C. Blainey

Keyword(s):

Biological Networks ◽

Chemical Space ◽

Treatment Strategies ◽

Therapeutic Effects ◽

Phenotypic Screening ◽

Chemical Library ◽

Gram Negative ◽

Gram Positive Bacteria ◽

Approved Drugs ◽

Advanced Treatments

Combinatorial drug treatment strategies perturb biological networks synergistically to achieve therapeutic effects and represent major opportunities to develop advanced treatments across a variety of human disease areas. However, the discovery of new combinatorial treatments is challenged by the sheer scale of combinatorial chemical space. Here, we report a high-throughput system for nanoliter-scale phenotypic screening that formulates a chemical library in nanoliter droplet emulsions and automates the construction of chemical combinations en masse using parallel droplet processing. We applied this system to predict synergy between more than 4,000 investigational and approved drugs and a panel of 10 antibiotics againstEscherichia coli, a model gram-negative pathogen. We found a range of drugs not previously indicated for infectious disease that synergize with antibiotics. Our validated hits include drugs that synergize with the antibiotics vancomycin, erythromycin, and novobiocin, which are used against gram-positive bacteria but are not effective by themselves to resolve gram-negative infections.

Download Full-text

Computational Analytical Framework for Affective Modeling

Handbook of Research on Synthesizing Human Emotion in Intelligent Systems and Robotics - Advances in Computational Intelligence and Robotics ◽

10.4018/978-1-4666-7278-9.ch001 ◽

2015 ◽

pp. 1-62 ◽

Cited By ~ 4

Author(s):

Eva Hudlicka

Keyword(s):

Ad Hoc ◽

Building Blocks ◽

Analytical Framework ◽

Design Guidelines ◽

Model Design ◽

Systematic Analysis ◽

Alternative Means ◽

Synthetic Agents ◽

Emotion Generation ◽

Affective Modeling

Computational affective models are being developed both to elucidate affective mechanisms, and to enhance believability of synthetic agents and robots. Yet in spite of the rapid growth of computational affective modeling, no systematic guidelines exist for model design and analysis. Lack of systematic guidelines contributes to ad hoc design practices, hinders model sharing and re-use, and makes systematic comparison of existing models and theories challenging. Lack of a common computational terminology also hinders cross-disciplinary communication that is essential to advance our understanding of emotions. In this chapter the author proposes a computational analytical framework to provide a basis for systematizing affective model design by: (1) viewing emotion models in terms of two core types: emotion generation and emotion effects, and (2) identifying the generic computational tasks necessary to implement these processes. The chapter then discusses how these computational ‘building blocks' can support the development of design guidelines, and a systematic analysis of distinct emotion theories and alternative means of their implementation.

Download Full-text