A Similarity Search Algorithm to Predict Protein Structures

Author(s):  
Jiyuan An ◽  
Yi-Ping Phoebe Chen
2000 ◽  
Vol 75 (1-2) ◽  
pp. 35-42 ◽  
Author(s):  
Ju-Hong Lee ◽  
Deok-Hwan Kim ◽  
Seok-Lyong Lee ◽  
Chin-Wan Chung ◽  
Guang-Ho Cha

Molecules ◽  
2019 ◽  
Vol 24 (12) ◽  
pp. 2233 ◽  
Author(s):  
Michele Montaruli ◽  
Domenico Alberga ◽  
Fulvio Ciriaco ◽  
Daniela Trisciuzzi ◽  
Anna Rita Tondo ◽  
...  

In this continuing work, we have updated our recently proposed Multi-fingerprint Similarity Search algorithm (MuSSel) by enabling the generation of dominant ionized species at a physiological pH and the exploration of a larger data domain, which included more than half a million high-quality small molecules extracted from the latest release of ChEMBL (version 24.1, at the time of writing). Provided with a high biological assay confidence score, these selected compounds explored up to 2822 protein drug targets. To improve the data accuracy, samples marked as prodrugs or with equivocal biological annotations were not considered. Notably, MuSSel performances were overall improved by using an object-relational database management system based on PostgreSQL. In order to challenge the real effectiveness of MuSSel in predicting relevant therapeutic drug targets, we analyzed a pool of 36 external bioactive compounds published in the Journal of Medicinal Chemistry from October to December 2018. This study demonstrates that the use of highly curated chemical and biological experimental data on one side, and a powerful multi-fingerprint search algorithm on the other, can be of the utmost importance in addressing the fate of newly conceived small molecules, by strongly reducing the attrition of early phases of drug discovery programs.


2020 ◽  
Author(s):  
Janani Durairaj ◽  
Mehmet Akdel ◽  
Dick de Ridder ◽  
Aalt DJ van Dijk

AbstractMotivationAs the number of experimentally solved protein structures rises, it becomes increasingly appealing to use structural information for predictive tasks involving proteins. Due to the large variation in protein sizes, folds, and topologies, an attractive approach is to embed protein structures into fixed-length vectors, which can be used in machine learning algorithms aimed at predicting and understanding functional and physical properties. Many existing embedding approaches are alignment-based, which is both time-consuming and ineffective for distantly related proteins. On the other hand, library- or model-based approaches depend on a small library of fragments or require the use of a trained model, both of which may not generalize well.ResultsWe present Geometricus, a novel and universally applicable approach to embedding proteins in a fixed-dimensional space. The approach is fast, accurate, and interpretable. Geometricus uses a set of 3D moment invariants to discretize fragments of protein structures into shape-mers, which are then counted to describe the full structure as a vector of counts. We demonstrate the applicability of this approach in various tasks, ranging from fast structure similarity search, unsupervised clustering, and structure classification across proteins from different superfamilies as well as within the same family.AvailabilityPython code available at https://git.wur.nl/durai001/[email protected], [email protected]


2018 ◽  
Vol 59 (1) ◽  
pp. 586-596 ◽  
Author(s):  
Domenico Alberga ◽  
Daniela Trisciuzzi ◽  
Michele Montaruli ◽  
Francesco Leonetti ◽  
Giuseppe Felice Mangiatordi ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document