A Similarity Search Algorithm to Predict Protein Structures

Distributed similarity search algorithm in distributed heterogeneous multimedia databases

Information Processing Letters ◽

10.1016/s0020-0190(00)00068-5 ◽

2000 ◽

Vol 75 (1-2) ◽

pp. 35-42 ◽

Cited By ~ 7

Author(s):

Ju-Hong Lee ◽

Deok-Hwan Kim ◽

Seok-Lyong Lee ◽

Chin-Wan Chung ◽

Guang-Ho Cha

Keyword(s):

Similarity Search ◽

Search Algorithm ◽

Multimedia Databases

Download Full-text

Probability Forecasting of Wind Power Ramp Events Using a Time Series Similarity Search Algorithm

2018 IEEE Energy Conversion Congress and Exposition (ECCE) ◽

10.1109/ecce.2018.8557610 ◽

2018 ◽

Cited By ~ 1

Author(s):

Bo Cao ◽

Liuchen Chang ◽

Xun Gong ◽

Julian L Cardenas Barrera ◽

Thomas Levy ◽

...

Keyword(s):

Time Series ◽

Wind Power ◽

Similarity Search ◽

Search Algorithm ◽

Probability Forecasting ◽

Ramp Events ◽

Wind Power Ramp Events

Download Full-text

Similarity Search Algorithm over Data Supply Chain Based on Key Points

Big Data Computing and Communications - Lecture Notes in Computer Science ◽

10.1007/978-3-319-42553-5_1 ◽

2016 ◽

pp. 3-12

Author(s):

Peng Li ◽

Hong Luo ◽

Yan Sun ◽

Xin-Ming Li

Keyword(s):

Supply Chain ◽

Similarity Search ◽

Search Algorithm ◽

Key Points

Download Full-text

Accelerating Drug Discovery by Early Protein Drug Target Prediction Based on a Multi-Fingerprint Similarity Search †

Molecules ◽

10.3390/molecules24122233 ◽

2019 ◽

Vol 24 (12) ◽

pp. 2233 ◽

Cited By ~ 11

Author(s):

Michele Montaruli ◽

Domenico Alberga ◽

Fulvio Ciriaco ◽

Daniela Trisciuzzi ◽

Anna Rita Tondo ◽

...

Keyword(s):

Drug Discovery ◽

Small Molecules ◽

Similarity Search ◽

Drug Targets ◽

Search Algorithm ◽

Confidence Score ◽

Database Management System ◽

Therapeutic Drug ◽

Protein Drug ◽

Early Protein

In this continuing work, we have updated our recently proposed Multi-fingerprint Similarity Search algorithm (MuSSel) by enabling the generation of dominant ionized species at a physiological pH and the exploration of a larger data domain, which included more than half a million high-quality small molecules extracted from the latest release of ChEMBL (version 24.1, at the time of writing). Provided with a high biological assay confidence score, these selected compounds explored up to 2822 protein drug targets. To improve the data accuracy, samples marked as prodrugs or with equivocal biological annotations were not considered. Notably, MuSSel performances were overall improved by using an object-relational database management system based on PostgreSQL. In order to challenge the real effectiveness of MuSSel in predicting relevant therapeutic drug targets, we analyzed a pool of 36 external bioactive compounds published in the Journal of Medicinal Chemistry from October to December 2018. This study demonstrates that the use of highly curated chemical and biological experimental data on one side, and a powerful multi-fingerprint search algorithm on the other, can be of the utmost importance in addressing the fate of newly conceived small molecules, by strongly reducing the attrition of early phases of drug discovery programs.

Download Full-text

An efficient similarity search algorithm for web video

2009 IEEE International Conference on Intelligent Computing and Intelligent Systems ◽

10.1109/icicisys.2009.5357706 ◽

2009 ◽

Author(s):

Zheng Cao ◽

Ming Zhu

Keyword(s):

Similarity Search ◽

Search Algorithm ◽

Web Video

Download Full-text

Geometricus Represents Protein Structures as Shape-mers Derived from Moment Invariants

10.1101/2020.09.07.285569 ◽

2020 ◽

Author(s):

Janani Durairaj ◽

Mehmet Akdel ◽

Dick de Ridder ◽

Aalt DJ van Dijk

Keyword(s):

Similarity Search ◽

Structural Information ◽

Dimensional Space ◽

Protein Structures ◽

Machine Learning Algorithms ◽

The Other ◽

Moment Invariants ◽

Full Structure ◽

Structure Similarity ◽

Related Proteins

AbstractMotivationAs the number of experimentally solved protein structures rises, it becomes increasingly appealing to use structural information for predictive tasks involving proteins. Due to the large variation in protein sizes, folds, and topologies, an attractive approach is to embed protein structures into fixed-length vectors, which can be used in machine learning algorithms aimed at predicting and understanding functional and physical properties. Many existing embedding approaches are alignment-based, which is both time-consuming and ineffective for distantly related proteins. On the other hand, library- or model-based approaches depend on a small library of fragments or require the use of a trained model, both of which may not generalize well.ResultsWe present Geometricus, a novel and universally applicable approach to embedding proteins in a fixed-dimensional space. The approach is fast, accurate, and interpretable. Geometricus uses a set of 3D moment invariants to discretize fragments of protein structures into shape-mers, which are then counted to describe the full structure as a vector of counts. We demonstrate the applicability of this approach in various tasks, ranging from fast structure similarity search, unsupervised clustering, and structure classification across proteins from different superfamilies as well as within the same family.AvailabilityPython code available at https://git.wur.nl/durai001/[email protected], [email protected]

Download Full-text