Efficient Multicriteria Protein Structure Comparison on Modern Processor Architectures

BioMed Research International ◽

10.1155/2015/563674 ◽

2015 ◽

Vol 2015 ◽

pp. 1-13 ◽

Cited By ~ 2

Author(s):

Anuj Sharma ◽

Elias S. Manolakos

Keyword(s):

Protein Structure ◽

Large Scale ◽

Protein Structures ◽

Structural Proteomics ◽

Single Chip ◽

Structure Comparison ◽

Protein Structure Comparison ◽

Processor Architectures ◽

Comparison Algorithms ◽

Many Core

Fast increasing computational demand for all-to-all protein structures comparison (PSC) is a result of three confounding factors: rapidly expanding structural proteomics databases, high computational complexity of pairwise protein comparison algorithms, and the trend in the domain towards using multiple criteria for protein structures comparison (MCPSC) and combining results. We have developed a software framework that exploits many-core and multicore CPUs to implement efficient parallel MCPSC in modern processors based on three popular PSC methods, namely, TMalign, CE, and USM. We evaluate and compare the performance and efficiency of the two parallel MCPSC implementations using Intel’s experimental many-core Single-Chip Cloud Computer (SCC) as well as Intel’s Core i7 multicore processor. We show that the 48-core SCC is more efficient than the latest generation Core i7, achieving a speedup factor of 42 (efficiency of 0.9), making many-core processors an exciting emerging technology for large-scale structural proteomics. We compare and contrast the performance of the two processors on several datasets and also show that MCPSC outperforms its component methods in grouping related domains, achieving a highF-measure of 0.91 on the benchmark CK34 dataset. The software implementation for protein structure comparison using the three methods and combined MCPSC, along with the developed underlyingrckskelalgorithmic skeletons library, is available via GitHub.

Download Full-text

Protein Structure Comparison: Algorithms and Applications

Lecture Notes in Computer Science - Mathematical Methods for Protein Structure Analysis and Design ◽

10.1007/978-3-540-44827-3_1 ◽

2003 ◽

pp. 1-33 ◽

Cited By ~ 21

Author(s):

Giuseppe Lancia ◽

Sorin Istrail

Keyword(s):

Protein Structure ◽

Structure Comparison ◽

Protein Structure Comparison ◽

Comparison Algorithms

Download Full-text

Evaluation of Novel Protein Structure Comparison Algorithms Based on Objective Function Rankings

2009 2nd International Conference on Biomedical Engineering and Informatics ◽

10.1109/bmei.2009.5304822 ◽

2009 ◽

Author(s):

Hitomi Hasegawa ◽

Liisa Holm

Keyword(s):

Protein Structure ◽

Objective Function ◽

Structure Comparison ◽

Protein Structure Comparison ◽

Comparison Algorithms ◽

Novel Protein

Download Full-text

Efficient algorithms and architectures for protein 3-D structure comparison

10.12681/eadd/44985 ◽

2018 ◽

Author(s):

Σάρμα Ανούτζ

Keyword(s):

Protein Structure ◽

Nearest Neighbor ◽

Network On Chip ◽

Consensus Methods ◽

Structure Comparison ◽

Protein Structure Comparison ◽

Class D ◽

On Chip ◽

Many Core ◽

F Measure

Η σύγκριση πρωτεϊνών με βάση τη δομή τους (protein structure comparison, PSC) αποτελεί τομέα της υπολογιστικής πρωτεομικής με ενεργό ενδιαφέρον καθότι χρησιμοποιείται ευρέως στη δομική βιολογία και την ανακάλυψη νέων φαρμάκων. Η ταχεία αύξηση των υπολογιστικών απαιτήσεων για τη σύγκριση πρωτεϊνικών δομών είναι αποτέλεσμα τριών κυρίως παραγόντων: ταχεία επέκταση των βάσεων δεδομένων με νέες δομές πρωτεϊνών, υψηλή υπολογιστική πολυπλοκότητα των αλγορίθμων σύγκρισης δύο πρωτεινών, τάση στον τομέα για χρήση πολλαπλών μεθόδων σύγκρισης και συνδυασμό των αποτελεσμάτων τους (multicriteria PSC, MCPSC) σε ένα σκορ συναίνεσης (consensus methods). Παρά την μεγάλη πρόοδο, εξακολουθούν να υπάρχουν ανοικτές προκλήσεις στην εφαρμογή MCPSC τεχνικών σε ευρεία κλίμακα. Πρώτον, η επιτάχυνση της λειτουργίας MCPSC με τη χρήση σύγχρονων αρχιτεκτονικών επεξεργαστών πολλών πυρήνων παραμένει κατά πολύ ανεξερεύνητη. Δεύτερον, η εφαρμογή μέθόδων MCPSC στη ταξινόμηση νεων δομών πρωτεϊνών είναι περιορισμένη λόγω του υπολογιστικού κόστους και της ανάγκης χρήσης υπερυπολογιστικών δομών. Τέλος, υπάρχει έλλειψη ελεύθερα διαθέσιμων εργαλείων βιοπληροφορικής που να υποστηρίζουν τη συστηματική σύγκριτική ανάλυση και κατηγοριοποίηση μεγάλων συνόλων πρωτεϊνών με βάση τη δομή τους σε κοινούς υπολογιστές.Προκειμένου να αντιμετωπιστούν αυτές οι σημαντικές προκλήσεις, σε αυτή την διατριβή αναπτύξαμε πλαίσιο λογισμικού που εκμεταλλεύεται σύγχρονους επεξεργαστές (CPUs) για την αποδοτική υλοποίηση παράλληλων MCPSC τεχνικών βασισμένων σε τρεις δημοφιλείς μεθόδους PSC, τις TMalign, CE και USM. Συγκρίνουμε και αξιολογούμε την απόδοση και την αποδοτικότητα δύο παράλληλων υλοποιήσεων, μια για τον επεξεργαστή αρχιτεκτονικής many-core Intel Single Cloud Computer (SCC) με 48 πυρήνες οργανωμένους σε δίκτυο πλέγματος (Network on Chip), και μια και για τον γνωστό επεξεργαστή Intel Core i7 πολλαπλών πυρήνων (multi-core CPU). Επιπλέον, αναπτύξαμε Python εφαρμογή, που ονομάζεται pyMCPSC, και επιτρέπει στους χρήστες να εκτελούν εύκολα υπολογιστικά πειράματα βασισμένα σε MCPSC με μεγάλα σύνολα δεδομένων, αξιοποιώντας τον παραλληλισμό που προσφέρουν οι επεξεργαστές πολλαπλών πυρήνων των σημερινών επιτραπέζιων υπολογιστών. Δείχνουμε πώς το pyMCPSC, το οποίο συνδυάζει πέντε δημοφιλείς μεθόδους PSC για τη δημιουργία πέντε διαφορετικών σκορ συναίνεσης (consensus scores), επιταχύνει σημαντικά και διευκολύνει την συγκριτική ανάλυση μεγάλων συνόλων δεδομένων με δομές πρωτεϊνών. Επιπλέον μπορεί να επεκταθεί εύκολα ώστε να ενσωματώνει στους αλγόριθμους συναίνεση και νέες μεθόδους PSC που μπορεί να προταθούν μελλοντικά καθώς ο τομέας εξελίσσεται.Τα αποτελέσματα συγκριτικής ανάλυσής δείχνουν ότι ο επεξεργαστής Intel SCC με 48 πυρήνες (Network on Chip) είναι πιο αποδοτικός από την τελευταίας γενιάς Core i7 CPU, επιτυγχάνοντας συντελεστή επιτάχυνσης 42 (απόδοση 0,9), και καθιστώντας τους επεξεργαστές αρχιτεκτονικής many-core τεχνολογία επιλογής για την υπολογιστική δομική πρωτεομική μεγάλης κλίμακας. Επιπλέον, δείχνουμε ότι το MCPSC ξεπερνά τις μεθόδους PSC στις οποίες στηρίζεται ως προς την επιτυχία της ομαδοποίησης νεων πρωτεϊνών, επιτυγχάνοντας F-measure 0,91 στο σύνολο δεδομένων αναφοράς CK34. Επιπλέον, δείχνουμε, με τη χρήση του συνόλου δεδομένων Proteus300, ότι οι τεχνικές MCPSC που αναπτύχθηκαν βελτιωνουν την κατηγοριοποίηση πρωτεϊνών, όπως αυτό αποδεικνύεται τόσο από την ανάλυση ROC όσο και από την ανάλυση κοντινότερων γειτόνων (Nearest-Neighbor). Επιπλεον. τα ”φυλογενετικά δέντρα” που προκύπτουν με τη χρηση MCPSC παρέχουν χρήσιμες πληροφορίες και σχετικά με τη πιθανή λειτουργικότητα νεων πρωτεϊνών. Τέλος, η συγκριτική ανάλυση αναδεικνύει την ύπαρξη ισχυρής συσχέτισης πρωτεϊνικών δομών της κατηγορίας SCOP class C και χαλαρής συσχέτισης μεταξύ εκείνων της κατηγορίας SCOP class D (Proteus300). Τέτοιου είδους ενδελεχείς αναλύσεις δεδομένων και οι αντίστοιχες οπτικοποιήσεις που τις συνοδεύουν βοηθούν τους χρήστες να εξερευνούν και να εξάγουν γνώση από σύνολα δεδομένων που αναλύουν, όσο μεγάλα κι αν είναι αυτά. Δειχνουμε ότι ακόμη και σε πολύ μεγάλα σύνολα δεδομένων, με χιλιάδες domains (όπως το SCOPCATH), μπορεί να εφαρμοστεί αποδοτικά MCPSC επεξεργασία προκειμένου να διερευνηθεί η εσωτερική δομή τους, αξιοποιώντας τους επεξεργαστές πολλών πυρήνων που υπάρχουν σήμερα στους ατομικούς υπολογιστες. Το pyMCPSC που υλοποιεί παράλληλα όλη την υπολογιστική ροή (pipeline) που αξιοποιεί μεθόδους MCPSC οι οποίες αναπτύχθηκαν σε αυτή την διδακτορική διατριβή διατίθεται ελεύθερα στη επιστημονική κοινότητα στο σύνδεσμο https://github.com/xulesc/pymcpsc.

Download Full-text

TOWARDS SCALEable PROTEIN STRUCTURE COMPARISON AND DATABASE SEARCH

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213005002417 ◽

2005 ◽

Vol 14 (05) ◽

pp. 827-848 ◽

Cited By ~ 2

Author(s):

CHERN-HOOI CHIONH ◽

ZHIYONG HUANG ◽

KIAN-LEE TAN ◽

ZHEN YAO

Keyword(s):

Protein Structure ◽

Structural Properties ◽

Protein Structures ◽

Query Protein ◽

Similarity Score ◽

Three Dimensions ◽

Dimensional Structure ◽

Performance Study ◽

Structure Comparison ◽

Protein Structure Comparison

Comparing protein structures in three dimensions is a computationally expensive process that makes a full scan of a protein against a library of known protein structures impractical. To reduce the cost, we can use an approximation of the three dimensional structure that allows protein comparison to be performed quickly to filter away dissimilar proteins. In this paper, we present a new algorithm, called SCALE, for protein structure comparison. In SCALE, a protein is represented as a sequence of secondary structure elements (SSEs) augmented with 3D structural properties such as the distances and angles between the SSEs. As such, the comparison between two proteins is reduced to a sequence alignment problem between their corresponding sequences of SSEs. The 3-D structural properties of the proteins contribute to the similarity score between the two sequences. We have implemented SCALE, and compared its performance against existing schemes. Our performance study shows that SCALE outperforms existing methods in terms of both efficiency and effectiveness (measured in terms of precision and recall). To avoid exhaustive search, an index based on the structural properties is also proposed. The index prunes away a considerable amount of dissimilar proteins given a query protein.

Download Full-text

Performance Evaluation of Protein Structure Comparison Algorithms Under Integrated Resource Management Environment for MPI Jobs

2008 IEEE International Symposium on Parallel and Distributed Processing with Applications ◽

10.1109/ispa.2008.41 ◽

2008 ◽

Author(s):

Azhar A. Shah ◽

Daniel Barthel ◽

Gianluigi Folino ◽

Natalio Kransnogor

Keyword(s):

Performance Evaluation ◽

Protein Structure ◽

Resource Management ◽

Structure Comparison ◽

Protein Structure Comparison ◽

Integrated Resource Management ◽

Comparison Algorithms

Download Full-text

Structural Learning of Proteins Using Graph Convolutional Neural Networks

10.1101/610444 ◽

2019 ◽

Cited By ~ 9

Author(s):

Rafael Zamora-Resendiz ◽

Silvia Crivelli

Keyword(s):

Neural Networks ◽

Protein Structure ◽

Data Storage ◽

Convolutional Neural Networks ◽

Network Architecture ◽

Structure Prediction ◽

Large Scale ◽

Protein Structures ◽

Data Representation ◽

Structural Proteomics

AbstractThe exponential growth of protein structure databases has motivated the development of efficient deep learning methods that perform structural analysis tasks at large scale, ranging from the classification of experimentally determined proteins to the quality assessment and ranking of computationally generated protein models in the context of protein structure prediction. Yet, the literature discussing these methods does not usually interpret what the models learned from the training or identify specific data attributes that contribute to the classification or regression task. While 3D and 2D CNNs have been widely used to deal with structural data, they have several limitations when applied to structural proteomics data. We pose that graph-based convolutional neural networks (GCNNs) are an efficient alternative while producing results that are interpretable. In this work, we demonstrate the applicability of GCNNs to protein structure classification problems. We define a novel spatial graph convolution network architecture which employs graph reduction methods to reduce the total number of trainable parameters and promote abstraction in interme-diate representations. We show that GCNNs are able to learn effectively from simplistic graph representations of protein structures while providing the ability to interpret what the network learns during the training and how it applies it to perform its task. GCNNs perform comparably to their 2D CNN counterparts in predictive performance and they are outperformed by them in training speeds. The graph-based data representation allows GCNNs to be a more efficient option over 3D CNNs when working with large-scale datasets as preprocessing costs and data storage requirements are negligible in comparison.

Download Full-text

Research on highly parallel embedded control system design and implementation method

Impact ◽

10.21820/23987073.2019.10.44 ◽

2019 ◽

Vol 2019 (10) ◽

pp. 44-46

Author(s):

Masato Edahiro ◽

Masaki Gondo

Keyword(s):

Computer Architecture ◽

Intelligent Systems ◽

Large Scale ◽

General Purpose ◽

Heterogeneous Structure ◽

Single Chip ◽

Powertrain Control ◽

Processing Power ◽

Hardware Description ◽

Many Core

The pace of technology's advancements is ever-increasing and intelligent systems, such as those found in robots and vehicles, have become larger and more complex. These intelligent systems have a heterogeneous structure, comprising a mixture of modules such as artificial intelligence (AI) and powertrain control modules that facilitate large-scale numerical calculation and real-time periodic processing functions. Information technology expert Professor Masato Edahiro, from the Graduate School of Informatics at the Nagoya University in Japan, explains that concurrent advances in semiconductor research have led to the miniaturisation of semiconductors, allowing a greater number of processors to be mounted on a single chip, increasing potential processing power. 'In addition to general-purpose processors such as CPUs, a mixture of multiple types of accelerators such as GPGPU and FPGA has evolved, producing a more complex and heterogeneous computer architecture,' he says. Edahiro and his partners have been working on the eMBP, a model-based parallelizer (MBP) that offers a mapping system as an efficient way of automatically generating parallel code for multi- and many-core systems. This ensures that once the hardware description is written, eMBP can bridge the gap between software and hardware to ensure that not only is an efficient ecosystem achieved for hardware vendors, but the need for different software vendors to adapt code for their particular platforms is also eliminated.

Download Full-text

ProCKSI: a decision support system for Protein (Structure) Comparison, Knowledge, Similarity and Information

BMC Bioinformatics ◽

10.1186/1471-2105-8-416 ◽

2007 ◽

Vol 8 (1) ◽

pp. 416 ◽

Cited By ~ 40

Author(s):

Daniel Barthel ◽

Jonathan D Hirst ◽

Jacek Błażewicz ◽

Edmund K Burke ◽

Natalio Krasnogor

Keyword(s):

Decision Support ◽

Protein Structure ◽

Decision Support System ◽

Support System ◽

Structure Comparison ◽

Protein Structure Comparison ◽

Knowledge Similarity

Download Full-text

Algorithmic re-structuring and data replication for protein structure comparison on a GRID

Future Generation Computer Systems ◽

10.1016/j.future.2006.03.029 ◽

2007 ◽

Vol 23 (3) ◽

pp. 391-397

Author(s):

G. Ciriello ◽

M. Comin ◽

C. Guerra

Keyword(s):

Protein Structure ◽

Data Replication ◽

Structure Comparison ◽

Protein Structure Comparison

Download Full-text

Multi-criteria protein structure comparison and structural similarities analysis using pyMCPSC

PLoS ONE ◽

10.1371/journal.pone.0204587 ◽

2018 ◽

Vol 13 (10) ◽

pp. e0204587 ◽

Cited By ~ 1

Author(s):

Anuj Sharma ◽

Elias S. Manolakos

Keyword(s):

Protein Structure ◽

Structure Comparison ◽

Protein Structure Comparison

Download Full-text