Molecular structures of cyclic mono- and di-(phosphoranylidene)aminophosphazenes: small-molecule models for high polymers

Author(s):  
Harry R. Allcock ◽  
Susan E. Kuharcik ◽  
Karyn B. Visscher ◽  
Dennis C. Ngo
Metabolites ◽  
2019 ◽  
Vol 9 (8) ◽  
pp. 160 ◽  
Author(s):  
Céline Brouard ◽  
Antoine Bassé ◽  
Florence d’Alché-Buc ◽  
Juho Rousu

In small molecule identification from tandem mass (MS/MS) spectra, input–output kernel regression (IOKR) currently provides the state-of-the-art combination of fast training and prediction and high identification rates. The IOKR approach can be simply understood as predicting a fingerprint vector from the MS/MS spectrum of the unknown molecule, and solving a pre-image problem to find the molecule with the most similar fingerprint. In this paper, we bring forward the following improvements to the IOKR framework: firstly, we formulate the IOKRreverse model that can be understood as mapping molecular structures into the MS/MS feature space and solving a pre-image problem to find the molecule whose predicted spectrum is the closest to the input MS/MS spectrum. Secondly, we introduce an approach to combine several IOKR and IOKRreverse models computed from different input and output kernels, called IOKRfusion. The method is based on minimizing structured Hinge loss of the combined model using a mini-batch stochastic subgradient optimization. Our experiments show a consistent improvement of top-k accuracy both in positive and negative ionization mode data.


Author(s):  
M.R. Halvagar ◽  
D.J. Salmon ◽  
W.B. Tolman

2020 ◽  
Vol 74 (10) ◽  
pp. 803-807
Author(s):  
Thomas C. Fessard ◽  
Kristina Goncharenko ◽  
Quentin Lefebvre ◽  
Christophe Salomé

In highly competitive research environments, the ability to access more complex structural spaces efficiently is a predictor of a company's ability to generate novel IP-protected small molecule candidates with adequate properties, hence filling their development pipelines. SpiroChem is consistently developing new synthetic methodologies and strategies to access complex molecular structure, thereby facilitating and accelerating small molecule drug discovery. Pushing the limits of what are perceived as complex molecular structures allows SpiroChem and its clients to unleash creativity and explore meaningful chemical spaces, which are under-exploited sources of novel active molecules. In this article, we explain how we differentiated ourselves in a globalized R&D environment and we provide several snapshots of how efficient methodologies can generate complex structures, rapidly.


2020 ◽  
Author(s):  
A Patrícia Bento ◽  
Anne Hersey ◽  
Eloy Felix ◽  
Greg Landrum ◽  
Anna Gaulton ◽  
...  

Abstract Background The ChEMBL database is one of a number of public databases that contain bioactivity data on small molecule compounds curated from diverse sources. Incoming compounds are typically not standardised according to consistent rules. In order to maintain the quality of the final database and to easily compare and integrate data on the same compound from different sources it is necessary for the chemical structures in the database to be appropriately standardised. Results A chemical curation pipeline has been developed using the open source toolkit RDKit. It comprises three components: a Checker to test the validity of chemical structures and flag any serious errors; a Standardizer which formats compounds according to defined rules and conventions and a GetParent component that removes any salts and solvents from the compound to create its parent. This pipeline has been applied to the latest version of the ChEMBL database as well as uncurated datasets from other sources to test the robustness of the process and to identify common issues in database molecular structures. Conclusion All the components of the structure pipeline have been made freely available for other researchers to use and adapt for their own use. The code is available in a GitHub repository and it can also be accessed via the ChEMBL Beaker webservices. It has been used successfully to standardise the nearly 2 million compounds in the ChEMBL database and the compound validity checker has been used to identify compounds with the most serious issues so that they can be prioritised for manual curation.


2017 ◽  
Vol 334 ◽  
pp. 54-66 ◽  
Author(s):  
Jessica Nadine Hamann ◽  
Benjamin Herzigkeit ◽  
Ramona Jurgeleit ◽  
Felix Tuczek

2020 ◽  
Author(s):  
Liu Cao ◽  
Mustafa Guler ◽  
Azat Tagirdzhanov ◽  
Yiyuan Lee ◽  
Alexey Gurevich ◽  
...  

AbstractIdentification of small molecules is a critical task in various areas of life science. Recent advances in mass spectrometry have enabled the collection of tandem mass spectra of small molecules from hundreds of thousands of environments. To identify which molecules are present in a sample, one can search mass spectra collected from the sample against millions of molecular structures in small molecule databases. This is a challenging task as currently it is not clear how small molecules are fragmented in mass spectrometry. The existing approaches use the domain knowledge from chemistry to predict fragmentation of molecules. However, these rule-based methods fail to explain many of the peaks in mass spectra of small molecules. Recently, spectral libraries with tens of thousands of labelled mass spectra of small molecules have emerged, paving the path for learning more accurate fragmentation models for mass spectral database search. We present molDiscovery, a mass spectral database search method that improves both efficiency and accuracy of small molecule identification by (i) utilizing an efficient algorithm to generate mass spectrometry fragmentations, and (ii) learning a probabilistic model to match small molecules with their mass spectra. We show our database search is an order of magnitude more efficient than the state-of-the-art methods, which enables searching against databases with millions of molecules. A search of over 8 million spectra from the Global Natural Product Social molecular networking infrastructure shows that our probabilistic model can correctly identify nearly six times more unique small molecules than previous methods. Moreover, by applying molDiscovery on microbial datasets with both mass spectral and genomics data we successfully discovered the novel biosynthetic gene clusters of three families of small molecules.AvailabilityThe command-line version of molDiscovery and its online web service through the GNPS infrastructure are available at https://github.com/mohimanilab/molDiscovery.


Sign in / Sign up

Export Citation Format

Share Document