small molecule databases Latest Research Papers

MolDiscovery: learning mass spectrometry fragmentation of small molecules

Nature Communications ◽

10.1038/s41467-021-23986-0 ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Liu Cao ◽

Mustafa Guler ◽

Azat Tagirdzhanov ◽

Yi-Yuan Lee ◽

Alexey Gurevich ◽

...

Keyword(s):

Mass Spectrometry ◽

Small Molecules ◽

Small Molecule ◽

Domain Knowledge ◽

Mass Spectra ◽

Molecular Structures ◽

Mass Spectral Database ◽

Mass Spectral ◽

Tandem Mass Spectra ◽

Small Molecule Databases

AbstractIdentification of small molecules is a critical task in various areas of life science. Recent advances in mass spectrometry have enabled the collection of tandem mass spectra of small molecules from hundreds of thousands of environments. To identify which molecules are present in a sample, one can search mass spectra collected from the sample against millions of molecular structures in small molecule databases. The existing approaches are based on chemistry domain knowledge, and they fail to explain many of the peaks in mass spectra of small molecules. Here, we present molDiscovery, a mass spectral database search method that improves both efficiency and accuracy of small molecule identification by learning a probabilistic model to match small molecules with their mass spectra. A search of over 8 million spectra from the Global Natural Product Social molecular networking infrastructure shows that molDiscovery correctly identify six times more unique small molecules than previous methods.

Download Full-text

Tautomeric Conflicts in Forty Small-Molecule Databases

10.26434/chemrxiv.14779254 ◽

2021 ◽

Author(s):

Devendra Kumar Dhaked ◽

Marc Nicklaus

Keyword(s):

Small Molecule ◽

Data Set ◽

Almost All ◽

Small Molecule Databases

We have analyzed forty different databases ranging in size from a few thousand to nearly 100 million molecules, comprising a total of over 200 million structures, for their tautomeric conflicts. A tautomeric conflict is defined as an occurrence of two or more structures within a data set identified by the tautomeric rules applied as being tautomers of each other. We tested a total of 119 detailed tautomeric transform rules expressed as SMIRKS, out of which 79 yielded at least one conflict. The databases analyzed spanned a wide variety of types including large aggregating databases, drug collections, and experimentally based structure collections. Almost all databases analyzed showed intra-database tautomeric conflicts. The conflict rates as percentage of the database were typically in the few tenths of a percent range, which for the largest databases amounts to more than 100,000 cases per database.

Download Full-text

Tautomeric Conflicts in Forty Small-Molecule Databases

10.26434/chemrxiv.14779254.v1 ◽

2021 ◽

Author(s):

Devendra Kumar Dhaked ◽

Marc Nicklaus

Keyword(s):

Small Molecule ◽

Data Set ◽

Almost All ◽

Small Molecule Databases

We have analyzed forty different databases ranging in size from a few thousand to nearly 100 million molecules, comprising a total of over 200 million structures, for their tautomeric conflicts. A tautomeric conflict is defined as an occurrence of two or more structures within a data set identified by the tautomeric rules applied as being tautomers of each other. We tested a total of 119 detailed tautomeric transform rules expressed as SMIRKS, out of which 79 yielded at least one conflict. The databases analyzed spanned a wide variety of types including large aggregating databases, drug collections, and experimentally based structure collections. Almost all databases analyzed showed intra-database tautomeric conflicts. The conflict rates as percentage of the database were typically in the few tenths of a percent range, which for the largest databases amounts to more than 100,000 cases per database.

Download Full-text

Small molecule databases: A collection of promising bioactive molecules

Concepts and Experimental Protocols of Modelling and Informatics in Drug Design ◽

10.1016/b978-0-12-820546-4.00003-9 ◽

2021 ◽

pp. 65-88

Author(s):

Om Silakari ◽

Pankaj Kumar Singh

Keyword(s):

Small Molecule ◽

Bioactive Molecules ◽

Small Molecule Databases

Download Full-text

Food bioactive small molecule databases: Deep boosting for the study of food molecular behaviors

Innovative Food Science & Emerging Technologies ◽

10.1016/j.ifset.2020.102499 ◽

2020 ◽

Vol 66 ◽

pp. 102499

Author(s):

Zheng-Fei Yang ◽

Ran Xiao ◽

Fei-Jun Luo ◽

Qin-Lu Lin ◽

Defang Ouyang ◽

...

Keyword(s):

Small Molecule ◽

Small Molecule Databases

Download Full-text

Toward a Comprehensive Treatment of Tautomerism in Chemoinformatics Including in InChI V2

10.26434/chemrxiv.10794962.v1 ◽

2019 ◽

Author(s):

Devendra K. Dhaked ◽

Wolf Ihlenfeldt ◽

Hitesh Patel ◽

Marc Nicklaus

Keyword(s):

Small Molecule ◽

Comprehensive Treatment ◽

Experimental Literature ◽

Web Tool ◽

Valence Tautomerism ◽

Standard Version ◽

Small Molecule Databases

<p>We have collected 86 different transforms of tautomeric interconversions. Out of those, 54 are for prototropic (non-ring-chain) tautomerism; 21 for ring-chain tautomerism; and 11 for valence tautomerism. The majority of these rules have been extracted from experimental literature. Twenty rules – covering the most well-known types of tautomerism such as keto-enol tautomerism – were taken from the default handling of tautomerism by the chemoinformatics toolkit CACTVS. The rules were analyzed against nine differerent databases totaling over 400 million (non-unique) structures as to their occurrence rates, mutual overlap in coverage, and recapitulation of the rules’ enumerated tautomer sets by InChI V.1.05, both in InChI’s Standard and a Non-Standard version with the increased tautomer-handling options 15T and KET turned on. These results and the background of this study are discussed in the context of the IUPAC InChI Project tasked with the redesign of handling of tautomerism for an InChI version 2. Applying the rules presented in this paper would approximately triple the number of compounds in typical small-molecule databases that would be affected by tautomeric interconversion by InChI V2. A web tool has been created to test these rules at https://cactus.nci.nih.gov/tautomerizer.</p>

Download Full-text

Toward a Comprehensive Treatment of Tautomerism in Chemoinformatics Including in InChI V2

10.26434/chemrxiv.10794962 ◽

2019 ◽

Author(s):

Devendra K. Dhaked ◽

Wolf Ihlenfeldt ◽

Hitesh Patel ◽

Marc Nicklaus

Keyword(s):

Small Molecule ◽

Comprehensive Treatment ◽

Experimental Literature ◽

Web Tool ◽

Valence Tautomerism ◽

Standard Version ◽

Small Molecule Databases

<p>We have collected 86 different transforms of tautomeric interconversions. Out of those, 54 are for prototropic (non-ring-chain) tautomerism; 21 for ring-chain tautomerism; and 11 for valence tautomerism. The majority of these rules have been extracted from experimental literature. Twenty rules – covering the most well-known types of tautomerism such as keto-enol tautomerism – were taken from the default handling of tautomerism by the chemoinformatics toolkit CACTVS. The rules were analyzed against nine differerent databases totaling over 400 million (non-unique) structures as to their occurrence rates, mutual overlap in coverage, and recapitulation of the rules’ enumerated tautomer sets by InChI V.1.05, both in InChI’s Standard and a Non-Standard version with the increased tautomer-handling options 15T and KET turned on. These results and the background of this study are discussed in the context of the IUPAC InChI Project tasked with the redesign of handling of tautomerism for an InChI version 2. Applying the rules presented in this paper would approximately triple the number of compounds in typical small-molecule databases that would be affected by tautomeric interconversion by InChI V2. A web tool has been created to test these rules at https://cactus.nci.nih.gov/tautomerizer.</p>

Download Full-text

Interoperable chemical structure search service

Journal of Cheminformatics ◽

10.1186/s13321-019-0367-2 ◽

2019 ◽

Vol 11 (1) ◽

Cited By ~ 1

Author(s):

Miroslav Kratochvíl ◽

Jiří Vondrášek ◽

Jakub Galgonek

Keyword(s):

Chemical Structure ◽

Query Language ◽

Search Space ◽

Chemical Databases ◽

Search Terms ◽

Structure Search ◽

Search Service ◽

Large Databases ◽

Similarity Searches ◽

Small Molecule Databases

Abstract Motivation The existing connections between large databases of chemicals, proteins, metabolites and assays offer valuable resources for research in fields ranging from drug design to metabolomics. Transparent search across multiple databases provides a way to efficiently utilize these resources. To simplify such searches, many databases have adopted semantic technologies that allow interoperable querying of the datasets using SPARQL query language. However, the interoperable interfaces of the chemical databases still lack the functionality of structure-driven chemical search, which is a fundamental method of data discovery in the chemical search space. Results We present a SPARQL service that augments existing semantic services by making interoperable substructure and similarity searches in small-molecule databases possible. The service thus offers new possibilities for querying interoperable databases, and simplifies writing of heterogeneous queries that include chemical-structure search terms. Availability The service is freely available and accessible using a standard SPARQL endpoint interface. The service documentation and user-oriented demonstration interfaces that allow quick explorative querying of datasets are available at https://idsm.elixir-czech.cz.

Download Full-text

Recent Advancements in Docking Methodologies

Oncology ◽

10.4018/978-1-5225-0549-5.ch033 ◽

2017 ◽

pp. 848-875

Author(s):

Vijay Kumar Srivastav ◽

Vineet Singh ◽

Meena Tiwari

Keyword(s):

Molecular Docking ◽

Binding Free Energy ◽

Binding Mode ◽

Docking Studies ◽

Protein Docking ◽

Scoring Functions ◽

Discovery Process ◽

Ligand Complex ◽

Small Molecule Databases ◽

Homology Models

Nowadays molecular docking has become an important methodology in CADD (Computer-Aided Drug Design)-assisted drug discovery process. It is an important computational tool widely used to predict binding mode, binding affinity and binding free energy of a protein-ligand complex. The important factors responsible for accurate results in docking studies are correct binding site prediction, use of suitable small-molecule databases, consistent docking pose, high dock score with good MD (Molecular Dynamics), clarity whether the compound is an inhibitor or agonist, etc. However, still there are several limitations which make it difficult to obtain accurate results from docking studies. In this chapter, the main focus is on recent advancements in various aspects of molecular docking such as ligand sampling, protein flexibility, scoring functions, fragment docking, post-processing, docking into homology models and protein-protein docking.

Download Full-text

Advanced SPARQL querying in small molecule databases

Journal of Cheminformatics ◽

10.1186/s13321-016-0144-4 ◽

2016 ◽

Vol 8 (1) ◽

Cited By ~ 1

Author(s):

Jakub Galgonek ◽

Tomáš Hurt ◽

Vendula Michlíková ◽

Petr Onderka ◽

Jan Schwarz ◽

...

Keyword(s):

Small Molecule ◽

Small Molecule Databases

Download Full-text

small molecule databases
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

MolDiscovery: learning mass spectrometry fragmentation of small molecules

Tautomeric Conflicts in Forty Small-Molecule Databases

Tautomeric Conflicts in Forty Small-Molecule Databases

Small molecule databases: A collection of promising bioactive molecules

Food bioactive small molecule databases: Deep boosting for the study of food molecular behaviors

Toward a Comprehensive Treatment of Tautomerism in Chemoinformatics Including in InChI V2

Toward a Comprehensive Treatment of Tautomerism in Chemoinformatics Including in InChI V2

Interoperable chemical structure search service

Recent Advancements in Docking Methodologies

Advanced SPARQL querying in small molecule databases

Export Citation Format

small molecule databasesRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

MolDiscovery: learning mass spectrometry fragmentation of small molecules

Tautomeric Conflicts in Forty Small-Molecule Databases

Tautomeric Conflicts in Forty Small-Molecule Databases

Small molecule databases: A collection of promising bioactive molecules

Food bioactive small molecule databases: Deep boosting for the study of food molecular behaviors

Toward a Comprehensive Treatment of Tautomerism in Chemoinformatics Including in InChI V2

Toward a Comprehensive Treatment of Tautomerism in Chemoinformatics Including in InChI V2

Interoperable chemical structure search service

Recent Advancements in Docking Methodologies

Advanced SPARQL querying in small molecule databases

small molecule databases
Recently Published Documents