scholarly journals Deciphering complex metabolite mixtures by unsupervised and supervised substructure discovery and semi-automated annotation from MS/MS spectra

2018 ◽  
Author(s):  
Simon Rogers ◽  
Cher Wei Ong ◽  
Joe Wandy ◽  
Madeleine Ernst ◽  
Lars Ridder ◽  
...  

Complex metabolite mixtures are challenging to unravel. Mass spectrometry (MS) is a widely used and sensitive technique to obtain structural information on complex mixtures. However, just knowing the molecular masses of the mixture's constituents is almost always insufficient for confident assignment of the associated chemical structures. Structural information can be augmented through MS fragmentation experiments whereby detected metabolites are fragmented giving rise to MS/MS spectra. However, how can we maximize the structural information we gain from fragmentation spectra? We recently proposed a substructure-based strategy to enhance metabolite annotation for complex mixtures by considering metabolites as the sum of (bio)chemically relevant moieties that we can detect through mass spectrometry fragmentation approaches. Our MS2LDA tool allows us to discover - unsupervised - groups of mass fragments and/or neutral losses termed Mass2Motifs that often correspond to substructures. After manual annotation, these Mass2Motifs can be used in subsequent MS2LDA analyses of new datasets, thereby providing structural annotations for many molecules that are not present in spectral databases. Here, we describe how additional strategies, taking advantage of i) combinatorial in-silico matching of experimental mass features to substructures of candidate molecules, and ii) automated machine learning classification of molecules, can facilitate semi-automated annotation of substructures. We show how our approach accelerates the Mass2Motif annotation process and therefore broadens the chemical space spanned by characterized motifs. Our machine learning model used to classify fragmentation spectra learns the relationships between fragment spectra and chemical features. Classification prediction on these features can be aggregated for all molecules that contribute to a particular Mass2Motif and guide Mass2Motif annotations. To make annotated Mass2Motifs available to the community, we also present motifDB: an open database of Mass2Motifs that can be browsed and accessed programmatically through an API. MotifDB is integrated within ms2lda.org, allowing users to efficiently search for characterized motifs in their own experiments. We expect that with an increasing number of Mass2Motif annotations available through a growing database we can more quickly gain insight in the constituents of complex mixtures. That will allow prioritization towards novel or unexpected chemistries and faster recognition of known biochemical building blocks.

2021 ◽  
Vol 13 (1) ◽  
Author(s):  
Janna Hastings ◽  
Martin Glauer ◽  
Adel Memariani ◽  
Fabian Neuhaus ◽  
Till Mossakowski

AbstractChemical data is increasingly openly available in databases such as PubChem, which contains approximately 110 million compound entries as of February 2021. With the availability of data at such scale, the burden has shifted to organisation, analysis and interpretation. Chemical ontologies provide structured classifications of chemical entities that can be used for navigation and filtering of the large chemical space. ChEBI is a prominent example of a chemical ontology, widely used in life science contexts. However, ChEBI is manually maintained and as such cannot easily scale to the full scope of public chemical data. There is a need for tools that are able to automatically classify chemical data into chemical ontologies, which can be framed as a hierarchical multi-class classification problem. In this paper we evaluate machine learning approaches for this task, comparing different learning frameworks including logistic regression, decision trees and long short-term memory artificial neural networks, and different encoding approaches for the chemical structures, including cheminformatics fingerprints and character-based encoding from chemical line notation representations. We find that classical learning approaches such as logistic regression perform well with sets of relatively specific, disjoint chemical classes, while the neural network is able to handle larger sets of overlapping classes but needs more examples per class to learn from, and is not able to make a class prediction for every molecule. Future work will explore hybrid and ensemble approaches, as well as alternative network architectures including neuro-symbolic approaches.


Metabolites ◽  
2018 ◽  
Vol 8 (3) ◽  
pp. 46 ◽  
Author(s):  
Ville Mikael Koistinen ◽  
Andreia Bento da Silva ◽  
László Abrankó ◽  
Dorrain Low ◽  
Rocio Garcia Villalba ◽  
...  

Bioactive compounds present in plant-based foods, and their metabolites derived from gut microbiota and endogenous metabolism, represent thousands of chemical structures of potential interest for human nutrition and health. State-of-the-art analytical methodologies, including untargeted metabolomics based on high-resolution mass spectrometry, are required for the profiling of these compounds in complex matrices, including plant food materials and biofluids. The aim of this project was to compare the analytical coverage of untargeted metabolomics methods independently developed and employed in various European platforms. In total, 56 chemical standards representing the most common classes of bioactive compounds spread over a wide chemical space were selected and analyzed by the participating platforms (n = 13) using their preferred untargeted method. The results were used to define analytical criteria for a successful analysis of plant food bioactives. Furthermore, they will serve as a basis for an optimized consensus method.


2012 ◽  
Vol 2012 ◽  
pp. 1-9 ◽  
Author(s):  
Mohamed Dehamchia ◽  
Zine Regainia

Herein, we describe an efficient one-step synthesis of new fused benzothiadiazepine-1,1-dioxides and macrocyclic sulfamides. The synthesis of these compounds was achieved in moderate yields starting from previously described N,N′-disubstituted symmetric sulfamides and N-tert-butoxycarbonyl, N′-alkyl sulfamide. The chemical structures of all the new compounds reported in this work were confirmed by NMR, IR, and mass spectrometry. These compounds are beneficial building blocks that can be used in deriving new chemical entities that exert a wide spectrum of pharmacological activities.


2020 ◽  
Author(s):  
Janna Hastings ◽  
Martin Glauer ◽  
Adel Memariani ◽  
Fabian Neuhaus ◽  
Till Mossakowski

Abstract Chemical data is increasingly openly available in databases such as PubChem, which contains more than 110 million compound entries as of October 2020. With the availability of data at such scale, the burden has shifted to organisation, analysis and interpretation. Chemical ontologies provide structured classifications of chemical entities that can be used for navigation and filtering of the large chemical space. ChEBI is a prominent example of a chemical ontology, widely used in life science contexts. However, ChEBI is manually maintained and as such cannot easily scale to the full scope of public chemical data. There is a need for tools that are able to automatically classify chemical data into chemical ontologies, which can be framed as a hierarchical multi-class classification problem. In this paper we evaluate machine learning approaches for this task, comparing different learning frameworks including logistic regression, decision trees and long short-term memory articial neural networks, and different encoding approaches for the chemical structures, including cheminformatics fingerprints and character-based encoding from chemical line notation representations. We nd that classical learning approaches such as logistic regression perform well with sets of relatively specific, disjoint chemical classes, while the neural network is able to handle larger sets of overlapping classes but needs more examples per class to learn from, and is not able to make a class prediction for every molecule. Future work will explore hybrid and ensemble approaches, as well as alternative network architectures including neuro-symbolic approaches.


Molecules ◽  
2021 ◽  
Vol 26 (3) ◽  
pp. 672
Author(s):  
Giuseppe Ermondi ◽  
Diego Garcia-Jimenez ◽  
Giulia Caron

Targeted protein degradation by PROTACs has emerged as a new modality for the knockdown of a range of proteins, and, more recently, it has become increasingly clear that the PROTAC chemical space requires characterization through a pool of ad hoc physicochemical descriptors. In this study, a new database named PROTAC-DB that provides extensive information about PROTACs and building blocks was used to obtain the 2D chemical structures of about 1600 PROTACs, 60 E3 ligands, 800 linkers, and 202 warheads. For every structure, we calculated a pool of seven 2D descriptors carefully identified as informative for large and flexible structures. For comparison purposes, the same procedure was applied to a dataset of about 50 bRo5 approved drugs reported in the literature. Correlation matrices, PCAs, box plots, and other graphical tools were used to define and understand the chemical space covered by PROTACs and building blocks in relation to other compounds. Results show that linkers have different properties than E3 ligands and warheads. Polar descriptors additivity is not respected when passing from building blocks to degraders. Moreover, a very preliminary analysis based on three PROTACs with high, intermediate, and low permeability showed how the most permeable compounds seem to occupy a region closer to bRo5 drugs and, thus, exhibit different properties than impermeable compounds. Finally, a second database, PROTACpedia, was used to discuss the relevance of physicochemical descriptors on degradation activity.


Data mining techniques are used in vast fields one of them is healthcare analysis. The present research is aimed to do the experimental analysis of multiple data mining classification /prediction techniques using three different machine learning classification and prediction tools over the online healthcare datasets. In this research, we have analyze different data mining classification and prediction techniques have been tested on four different online healthcare datasets. The standards used are a percentage of accuracy and error rate of every applied classifier technique. The experimental analysis are performed using the 10 fold cross-validation technique. Best suitable classification technique for a particular online dataset is selected based on the highest classification accuracy and the least error rate as performance measurement indicators.


Author(s):  
Bruno Schueler ◽  
Robert W. Odom

Time-of-flight secondary ion mass spectrometry (TOF-SIMS) provides unique capabilities for elemental and molecular compositional analysis of a wide variety of surfaces. This relatively new technique is finding increasing applications in analyses concerned with determining the chemical composition of various polymer surfaces, identifying the composition of organic and inorganic residues on surfaces and the localization of molecular or structurally significant secondary ions signals from biological tissues. TOF-SIMS analyses are typically performed under low primary ion dose (static SIMS) conditions and hence the secondary ions formed often contain significant structural information.This paper will present an overview of current TOF-SIMS instrumentation with particular emphasis on the stigmatic imaging ion microscope developed in the authors’ laboratory. This discussion will be followed by a presentation of several useful applications of the technique for the characterization of polymer surfaces and biological tissues specimens. Particular attention in these applications will focus on how the analytical problem impacts the performance requirements of the mass spectrometer and vice-versa.


2018 ◽  
Author(s):  
Sherif Tawfik ◽  
Olexandr Isayev ◽  
Catherine Stampfl ◽  
Joseph Shapter ◽  
David Winkler ◽  
...  

Materials constructed from different van der Waals two-dimensional (2D) heterostructures offer a wide range of benefits, but these systems have been little studied because of their experimental and computational complextiy, and because of the very large number of possible combinations of 2D building blocks. The simulation of the interface between two different 2D materials is computationally challenging due to the lattice mismatch problem, which sometimes necessitates the creation of very large simulation cells for performing density-functional theory (DFT) calculations. Here we use a combination of DFT, linear regression and machine learning techniques in order to rapidly determine the interlayer distance between two different 2D heterostructures that are stacked in a bilayer heterostructure, as well as the band gap of the bilayer. Our work provides an excellent proof of concept by quickly and accurately predicting a structural property (the interlayer distance) and an electronic property (the band gap) for a large number of hybrid 2D materials. This work paves the way for rapid computational screening of the vast parameter space of van der Waals heterostructures to identify new hybrid materials with useful and interesting properties.


Sign in / Sign up

Export Citation Format

Share Document