GeoMine: interactive pattern mining of protein–ligand interfaces in the Protein Data Bank

Author(s):  
Konrad Diedrich ◽  
Joel Graef ◽  
Katrin Schöning-Stierand ◽  
Matthias Rarey

Abstract Summary The searching of user-defined 3D queries in molecular interfaces is a computationally challenging problem that is not satisfactorily solved so far. Most of the few existing tools focused on that purpose are desktop based and not openly available. Besides that, they show a lack of query versatility, search efficiency and user-friendliness. We address this issue with GeoMine, a publicly available web application that provides textual, numerical and geometrical search functionality for protein–ligand binding sites derived from structural data contained in the Protein Data Bank (PDB). The query generation is supported by a 3D representation of a start structure that provides interactively selectable elements like atoms, bonds and interactions. GeoMine gives full control over geometric variability in the query while performing a deterministic, precise search. Reasonably selective queries are processed on the entire set of protein–ligand complexes in the PDB within a few minutes. GeoMine offers an interactive and iterative search process of successive result analyses and query adaptations. From the numerous potential applications, we picked two from the field of side-effect analyze showcasing the usefulness of GeoMine. Availability and implementation GeoMine is part of the ProteinsPlus web application suite and freely available at https://proteins.plus. Supplementary information Supplementary data are available at Bioinformatics online.

Author(s):  
Krzysztof Szczepaniak ◽  
Adriana Bukala ◽  
Antonio Marinho da Silva Neto ◽  
Jan Ludwiczak ◽  
Stanislaw Dunin-Horkawicz

Abstract Motivation Coiled coils are widespread protein domains involved in diverse processes ranging from providing structural rigidity to the transduction of conformational changes. They comprise two or more α-helices that are wound around each other to form a regular supercoiled bundle. Owing to this regularity, coiled-coil structures can be described with parametric equations, thus enabling the numerical representation of their properties, such as the degree and handedness of supercoiling, rotational state of the helices, and the offset between them. These descriptors are invaluable in understanding the function of coiled coils and designing new structures of this type. The existing tools for such calculations require manual preparation of input and are therefore not suitable for the high-throughput analyses. Results To address this problem, we developed SamCC-Turbo, a software for fully-automated, per-residue measurement of coiled coils. By surveying Protein Data Bank with SamCC-Turbo, we generated a comprehensive atlas of ∼50,000 coiled-coil regions. This machine learning-ready data set features precise measurements as well as decomposes coiled-coil structures into fragments characterized by various degrees of supercoiling. The potential applications of SamCC-Turbo are exemplified by analyses in which we reveal general structural features of coiled coils involved in functions requiring conformational plasticity. Finally, we discuss further directions in the prediction and modeling of coiled coils. Availability SamCC-Turbo is available as a web server (https://lbs.cent.uw.edu.pl/samcc_turbo) and as a Python library (https://github.com/labstructbioinf/samcc_turbo), whereas the results of the Protein Data Bank scan can be browsed and downloaded at https://lbs.cent.uw.edu.pl/ccdb. Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
Joan Segura ◽  
Yana Rose ◽  
John Westbrook ◽  
Stephen K Burley ◽  
Jose M Duarte

Abstract Motivation Interoperability between polymer sequences and structural data is essential for providing a complete picture of protein and gene features and helping to understand biomolecular function. Results Herein, we present two resources designed to improve interoperability between the RCSB Protein Data Bank, the NCBI and the UniProtKB data resources and visualize integrated data therefrom. The underlying tools provide a flexible means of mapping between the different coordinate spaces and an interactive tool allows convenient visualization of the 1-dimensional data over the web. Availabilityand implementation https://1d-coordinates.rcsb.org and https://rcsb.github.io/rcsb-saguaro. Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
Keisuke Arikawa

On the basis of robot kinematics, we have thus far developed a method for predicting the motion of proteins from their 3D structural data given in the Protein Data Bank (PDB data). In this method, proteins are modeled as serial manipulators constrained by springs and the structural compliance properties of the models are evaluated. We focus on localized instead of whole structures of proteins. Employing the same model used in our method of motion prediction, the motion properties of the localized structures and the relation between the motion properties of localized and whole structures are analyzed. First, we present a method for graphically expressing the deformation of objects with a complex shape, such as proteins, by approximating the shape as a rectangular prism with a mesh on its surface. We then formulate a method for comparing the motion properties of localized structures cleaved from the whole structure and those remaining in it by expressing the motion of the latter using the decomposed motion modes of the former according to the structural compliance. Finally, we show a method for evaluating the effect of a localized structure on the motion properties of proteins by applying forces to localized structures. In the formulations, we demonstrate applications as illustrative examples using the PDB data of a real protein.


2015 ◽  
Vol 43 (W1) ◽  
pp. W383-W388 ◽  
Author(s):  
David Sehnal ◽  
Lukáš Pravda ◽  
Radka Svobodová Vařeková ◽  
Crina-Maria Ionescu ◽  
Jaroslav Koča

2011 ◽  
Vol 40 (D1) ◽  
pp. D453-D460 ◽  
Author(s):  
A. R. Kinjo ◽  
H. Suzuki ◽  
R. Yamashita ◽  
Y. Ikegawa ◽  
T. Kudou ◽  
...  

F1000Research ◽  
2021 ◽  
Vol 10 ◽  
pp. 1236
Author(s):  
Magdalena Ługowska ◽  
Marcin Pacholczyk

Background: Difficulties in translating the in vitro potency determined by cellular assays into in vivo efficacy in living organisms complicates the design and development of drugs. However,  the residence time of a drug in its molecular target is becoming a key parameter in the design and optimization of new drugs, as recent studies show that residence time can reliably predict drug efficacy in vivo. Experimental approaches to binding kinetics and target ligand complex solutions are currently available, but known bioinformatics databases do not usually report information about the ligand residence time in its molecular target. Methods: To extend existing databases we developed the Protein Data Bank (PDB) residence time database (PDBrt) which reports drug residence time. The database is implemented as an open access web-based tool. The front end uses Bootstrap with Hypertext Markup Language (HTML), jQuery for the interface and 3Dmol.js to visualize the complexes. The server-side code uses Python web application framework, Django Rest Framework and backend database PostgreSQL. Results: The PDBrt database is a free, non-commercial repository for 3D protein-ligand complex data, including the measured ligand residence time inside the binding pocket of the specific biological macromolecules as deposited in The Protein Data Bank. The PDBrt database contains information about both the protein and the ligand separately, as well as the protein-ligand complex, binding kinetics, and time of the ligand residence inside the protein binding site. Availability: https://pdbrt.polsl.pl


2019 ◽  
Vol 35 (20) ◽  
pp. 4165-4167 ◽  
Author(s):  
Jonathan Fine ◽  
Gaurav Chopra

Abstract Motivation The Protein Data Bank (PDB) currently holds over 140 000 biomolecular structures and continues to release new structures on a weekly basis. The PDB is an essential resource to the structural bioinformatics community to develop software that mine, use, categorize and analyze such data. New computational biology methods are evaluated using custom benchmarking sets derived as subsets of 3D experimentally determined structures and structural features from the PDB. Currently, such benchmarking features are manually curated with custom scripts in a non-standardized manner that results in slow distribution and updates with new experimental structures. Finally, there is a scarcity of standardized tools to rapidly query 3D descriptors of the entire PDB. Results Our solution is the Lemon framework, a C++11 library with Python bindings, which provides a consistent workflow methodology for selecting biomolecular interactions based on user criterion and computing desired 3D structural features. This framework can parse and characterize the entire PDB in <10 min on modern, multithreaded hardware. The speed in parsing is obtained by using the recently developed MacroMolecule Transmission Format to reduce the computational cost of reading text-based PDB files. The use of C++ lambda functions and Python bindings provide extensive flexibility for analysis and categorization of the PDB by allowing the user to write custom functions to suite their objective. We think Lemon will become a one-stop-shop to quickly mine the entire PDB to generate desired structural biology features. Availability and implementation The Lemon software is available as a C++ header library along with a PyPI package and example functions at https://github.com/chopralab/lemon. Supplementary information Supplementary data are available at Bioinformatics online.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Glen van Ginkel ◽  
Lukáš Pravda ◽  
José M. Dana ◽  
Mihaly Varadi ◽  
Peter Keller ◽  
...  

Abstract Background Biomacromolecular structural data outgrew the legacy Protein Data Bank (PDB) format which the scientific community relied on for decades, yet the use of its successor PDBx/Macromolecular Crystallographic Information File format (PDBx/mmCIF) is still not widespread. Perhaps one of the reasons is the availability of easy to use tools that only support the legacy format, but also the inherent difficulties of processing mmCIF files correctly, given the number of edge cases that make efficient parsing problematic. Nevertheless, to fully exploit macromolecular structure data and their associated annotations such as multiscale structures from integrative/hybrid methods or large macromolecular complexes determined using traditional methods, it is necessary to fully adopt the new format as soon as possible. Results To this end, we developed PDBeCIF, an open-source Python project for manipulating mmCIF and CIF files. It is part of the official list of mmCIF parsers recorded by the wwPDB and is heavily employed in the processes of the Protein Data Bank in Europe. The package is freely available both from the PyPI repository (http://pypi.org/project/pdbecif) and from GitHub (https://github.com/pdbeurope/pdbecif) along with rich documentation and many ready-to-use examples. Conclusions PDBeCIF is an efficient and lightweight Python 2.6+/3+ package with no external dependencies. It can be readily integrated with 3rd party libraries as well as adopted for broad scientific analyses.


2012 ◽  
Vol 68 (4) ◽  
pp. 478-483 ◽  
Author(s):  
Swanand Gore ◽  
Sameer Velankar ◽  
Gerard J. Kleywegt

There is an increasing realisation that the quality of the biomacromolecular structures deposited in the Protein Data Bank (PDB) archive needs to be assessed critically using established and powerful validation methods. The Worldwide Protein Data Bank (wwPDB) organization has convened several Validation Task Forces (VTFs) to advise on the methods and standards that should be used to validate all of the entries already in the PDB as well as all structures that will be deposited in the future. The recommendations of the X-ray VTF are currently being implemented in a software pipeline. Here, ongoing work on this pipeline is briefly described as well as ways in which validation-related information could be presented to users of structural data.


Sign in / Sign up

Export Citation Format

Share Document