GeoMine: interactive pattern mining of protein–ligand interfaces in the Protein Data Bank

Bioinformatics ◽

10.1093/bioinformatics/btaa693 ◽

2020 ◽

Author(s):

Konrad Diedrich ◽

Joel Graef ◽

Katrin Schöning-Stierand ◽

Matthias Rarey

Keyword(s):

Protein Data Bank ◽

Web Application ◽

Pattern Mining ◽

Structural Data ◽

Data Bank ◽

Supplementary Information ◽

User Friendliness ◽

Iterative Search ◽

Potential Applications ◽

Query Generation

Abstract Summary The searching of user-defined 3D queries in molecular interfaces is a computationally challenging problem that is not satisfactorily solved so far. Most of the few existing tools focused on that purpose are desktop based and not openly available. Besides that, they show a lack of query versatility, search efficiency and user-friendliness. We address this issue with GeoMine, a publicly available web application that provides textual, numerical and geometrical search functionality for protein–ligand binding sites derived from structural data contained in the Protein Data Bank (PDB). The query generation is supported by a 3D representation of a start structure that provides interactively selectable elements like atoms, bonds and interactions. GeoMine gives full control over geometric variability in the query while performing a deterministic, precise search. Reasonably selective queries are processed on the entire set of protein–ligand complexes in the PDB within a few minutes. GeoMine offers an interactive and iterative search process of successive result analyses and query adaptations. From the numerous potential applications, we picked two from the field of side-effect analyze showcasing the usefulness of GeoMine. Availability and implementation GeoMine is part of the ProteinsPlus web application suite and freely available at https://proteins.plus. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

A library of coiled-coil domains: from regular bundles to peculiar twists

Bioinformatics ◽

10.1093/bioinformatics/btaa1041 ◽

2020 ◽

Author(s):

Krzysztof Szczepaniak ◽

Adriana Bukala ◽

Antonio Marinho da Silva Neto ◽

Jan Ludwiczak ◽

Stanislaw Dunin-Horkawicz

Keyword(s):

Protein Data Bank ◽

Conformational Changes ◽

Coiled Coil ◽

Data Bank ◽

Structural Features ◽

Coiled Coils ◽

Supplementary Information ◽

Numerical Representation ◽

Data Set ◽

Potential Applications

Abstract Motivation Coiled coils are widespread protein domains involved in diverse processes ranging from providing structural rigidity to the transduction of conformational changes. They comprise two or more α-helices that are wound around each other to form a regular supercoiled bundle. Owing to this regularity, coiled-coil structures can be described with parametric equations, thus enabling the numerical representation of their properties, such as the degree and handedness of supercoiling, rotational state of the helices, and the offset between them. These descriptors are invaluable in understanding the function of coiled coils and designing new structures of this type. The existing tools for such calculations require manual preparation of input and are therefore not suitable for the high-throughput analyses. Results To address this problem, we developed SamCC-Turbo, a software for fully-automated, per-residue measurement of coiled coils. By surveying Protein Data Bank with SamCC-Turbo, we generated a comprehensive atlas of ∼50,000 coiled-coil regions. This machine learning-ready data set features precise measurements as well as decomposes coiled-coil structures into fragments characterized by various degrees of supercoiling. The potential applications of SamCC-Turbo are exemplified by analyses in which we reveal general structural features of coiled coils involved in functions requiring conformational plasticity. Finally, we discuss further directions in the prediction and modeling of coiled coils. Availability SamCC-Turbo is available as a web server (https://lbs.cent.uw.edu.pl/samcc_turbo) and as a Python library (https://github.com/labstructbioinf/samcc_turbo), whereas the results of the Protein Data Bank scan can be browsed and downloaded at https://lbs.cent.uw.edu.pl/ccdb. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

RCSB Protein Data Bank 1D tools and services

Bioinformatics ◽

10.1093/bioinformatics/btaa1012 ◽

2020 ◽

Author(s):

Joan Segura ◽

Yana Rose ◽

John Westbrook ◽

Stephen K Burley ◽

Jose M Duarte

Keyword(s):

Protein Data Bank ◽

Structural Data ◽

Data Bank ◽

Complete Picture ◽

Supplementary Information ◽

Supplementary Data ◽

Interactive Tool ◽

The Web

Abstract Motivation Interoperability between polymer sequences and structural data is essential for providing a complete picture of protein and gene features and helping to understand biomolecular function. Results Herein, we present two resources designed to improve interoperability between the RCSB Protein Data Bank, the NCBI and the UniProtKB data resources and visualize integrated data therefrom. The underlying tools provide a flexible means of mapping between the different coordinate spaces and an interactive tool allows convenient visualization of the 1-dimensional data over the web. Availabilityand implementation https://1d-coordinates.rcsb.org and https://rcsb.github.io/rcsb-saguaro. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Analyzing Motion Properties of Proteins Affected by Localized Structures From a Robot Kinematics Perspective

Volume 5A: 39th Mechanisms and Robotics Conference ◽

10.1115/detc2015-47010 ◽

2015 ◽

Author(s):

Keisuke Arikawa

Keyword(s):

Protein Data Bank ◽

Complex Shape ◽

Structural Data ◽

Data Bank ◽

Robot Kinematics ◽

Motion Prediction ◽

Serial Manipulators ◽

Localized Structures ◽

Motion Modes ◽

Structural Compliance

On the basis of robot kinematics, we have thus far developed a method for predicting the motion of proteins from their 3D structural data given in the Protein Data Bank (PDB data). In this method, proteins are modeled as serial manipulators constrained by springs and the structural compliance properties of the models are evaluated. We focus on localized instead of whole structures of proteins. Employing the same model used in our method of motion prediction, the motion properties of the localized structures and the relation between the motion properties of localized and whole structures are analyzed. First, we present a method for graphically expressing the deformation of objects with a complex shape, such as proteins, by approximating the shape as a rectangular prism with a mesh on its surface. We then formulate a method for comparing the motion properties of localized structures cleaved from the whole structure and those remaining in it by expressing the motion of the latter using the decomposed motion modes of the former according to the structural compliance. Finally, we show a method for evaluating the effect of a localized structure on the motion properties of proteins by applying forces to localized structures. In the formulations, we demonstrate applications as illustrative examples using the PDB data of a real protein.

Download Full-text

PatternQuery: web application for fast detection of biomacromolecular structural patterns in the entire Protein Data Bank

Nucleic Acids Research ◽

10.1093/nar/gkv561 ◽

2015 ◽

Vol 43 (W1) ◽

pp. W383-W388 ◽

Cited By ~ 16

Author(s):

David Sehnal ◽

Lukáš Pravda ◽

Radka Svobodová Vařeková ◽

Crina-Maria Ionescu ◽

Jaroslav Koča

Keyword(s):

Protein Data Bank ◽

Web Application ◽

Data Bank ◽

Structural Patterns ◽

Fast Detection

Download Full-text

Protein Data Bank Japan (PDBj): maintaining a structural data archive and resource description framework format

Nucleic Acids Research ◽

10.1093/nar/gkr811 ◽

2011 ◽

Vol 40 (D1) ◽

pp. D453-D460 ◽

Cited By ~ 88

Author(s):

A. R. Kinjo ◽

H. Suzuki ◽

R. Yamashita ◽

Y. Ikegawa ◽

T. Kudou ◽

...

Keyword(s):

Protein Data Bank ◽

Resource Description Framework ◽

Structural Data ◽

Data Bank ◽

Data Archive ◽

Description Framework ◽

Resource Description

Download Full-text

PDBrt: A free database of complexes with measured drug-target residence time

F1000Research ◽

10.12688/f1000research.73420.1 ◽

2021 ◽

Vol 10 ◽

pp. 1236

Author(s):

Magdalena Ługowska ◽

Marcin Pacholczyk

Keyword(s):

Protein Data Bank ◽

Residence Time ◽

Web Application ◽

Data Bank ◽

Molecular Target ◽

Binding Kinetics ◽

New Drugs ◽

Complex Data ◽

Ligand Complex

Background: Difficulties in translating the in vitro potency determined by cellular assays into in vivo efficacy in living organisms complicates the design and development of drugs. However, the residence time of a drug in its molecular target is becoming a key parameter in the design and optimization of new drugs, as recent studies show that residence time can reliably predict drug efficacy in vivo. Experimental approaches to binding kinetics and target ligand complex solutions are currently available, but known bioinformatics databases do not usually report information about the ligand residence time in its molecular target. Methods: To extend existing databases we developed the Protein Data Bank (PDB) residence time database (PDBrt) which reports drug residence time. The database is implemented as an open access web-based tool. The front end uses Bootstrap with Hypertext Markup Language (HTML), jQuery for the interface and 3Dmol.js to visualize the complexes. The server-side code uses Python web application framework, Django Rest Framework and backend database PostgreSQL. Results: The PDBrt database is a free, non-commercial repository for 3D protein-ligand complex data, including the measured ligand residence time inside the binding pocket of the specific biological macromolecules as deposited in The Protein Data Bank. The PDBrt database contains information about both the protein and the ligand separately, as well as the protein-ligand complex, binding kinetics, and time of the ligand residence inside the protein binding site. Availability: https://pdbrt.polsl.pl

Download Full-text

Lemon: a framework for rapidly mining structural information from the Protein Data Bank

Bioinformatics ◽

10.1093/bioinformatics/btz178 ◽

2019 ◽

Vol 35 (20) ◽

pp. 4165-4167 ◽

Cited By ~ 1

Author(s):

Jonathan Fine ◽

Gaurav Chopra

Keyword(s):

Protein Data Bank ◽

Structural Information ◽

Computational Cost ◽

Data Bank ◽

Structural Features ◽

Supplementary Information ◽

Develop Software ◽

Reading Text ◽

Essential Resource ◽

3D Descriptors

Abstract Motivation The Protein Data Bank (PDB) currently holds over 140 000 biomolecular structures and continues to release new structures on a weekly basis. The PDB is an essential resource to the structural bioinformatics community to develop software that mine, use, categorize and analyze such data. New computational biology methods are evaluated using custom benchmarking sets derived as subsets of 3D experimentally determined structures and structural features from the PDB. Currently, such benchmarking features are manually curated with custom scripts in a non-standardized manner that results in slow distribution and updates with new experimental structures. Finally, there is a scarcity of standardized tools to rapidly query 3D descriptors of the entire PDB. Results Our solution is the Lemon framework, a C++11 library with Python bindings, which provides a consistent workflow methodology for selecting biomolecular interactions based on user criterion and computing desired 3D structural features. This framework can parse and characterize the entire PDB in <10 min on modern, multithreaded hardware. The speed in parsing is obtained by using the recently developed MacroMolecule Transmission Format to reduce the computational cost of reading text-based PDB files. The use of C++ lambda functions and Python bindings provide extensive flexibility for analysis and categorization of the PDB by allowing the user to write custom functions to suite their objective. We think Lemon will become a one-stop-shop to quickly mine the entire PDB to generate desired structural biology features. Availability and implementation The Lemon software is available as a C++ header library along with a PyPI package and example functions at https://github.com/chopralab/lemon. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

PDBMine: A Reformulation of the Protein Data Bank to Facilitate Structural Data Mining

2019 International Conference on Computational Science and Computational Intelligence (CSCI) ◽

10.1109/csci49370.2019.00272 ◽

2019 ◽

Cited By ~ 1

Author(s):

Casey Cole ◽

Christopher Ott ◽

Diego Valdes ◽

Homayoun Valafar

Keyword(s):

Data Mining ◽

Protein Data Bank ◽

Structural Data ◽

Data Bank

Download Full-text

PDBeCIF: an open-source mmCIF/CIF parsing and processing package

BMC Bioinformatics ◽

10.1186/s12859-021-04271-9 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Glen van Ginkel ◽

Lukáš Pravda ◽

José M. Dana ◽

Mihaly Varadi ◽

Peter Keller ◽

...

Keyword(s):

Open Source ◽

Protein Data Bank ◽

Hybrid Methods ◽

Structural Data ◽

Data Bank ◽

File Format ◽

Macromolecular Complexes ◽

Link Type ◽

Information File ◽

Multiscale Structures

Abstract Background Biomacromolecular structural data outgrew the legacy Protein Data Bank (PDB) format which the scientific community relied on for decades, yet the use of its successor PDBx/Macromolecular Crystallographic Information File format (PDBx/mmCIF) is still not widespread. Perhaps one of the reasons is the availability of easy to use tools that only support the legacy format, but also the inherent difficulties of processing mmCIF files correctly, given the number of edge cases that make efficient parsing problematic. Nevertheless, to fully exploit macromolecular structure data and their associated annotations such as multiscale structures from integrative/hybrid methods or large macromolecular complexes determined using traditional methods, it is necessary to fully adopt the new format as soon as possible. Results To this end, we developed PDBeCIF, an open-source Python project for manipulating mmCIF and CIF files. It is part of the official list of mmCIF parsers recorded by the wwPDB and is heavily employed in the processes of the Protein Data Bank in Europe. The package is freely available both from the PyPI repository (http://pypi.org/project/pdbecif) and from GitHub (https://github.com/pdbeurope/pdbecif) along with rich documentation and many ready-to-use examples. Conclusions PDBeCIF is an efficient and lightweight Python 2.6+/3+ package with no external dependencies. It can be readily integrated with 3rd party libraries as well as adopted for broad scientific analyses.

Download Full-text

Implementing an X-ray validation pipeline for the Protein Data Bank

Acta Crystallographica Section D Biological Crystallography ◽

10.1107/s0907444911050359 ◽

2012 ◽

Vol 68 (4) ◽

pp. 478-483 ◽

Cited By ~ 60

Author(s):

Swanand Gore ◽

Sameer Velankar ◽

Gerard J. Kleywegt

Keyword(s):

Protein Data Bank ◽

Structural Data ◽

Data Bank ◽

Software Pipeline ◽

X Ray ◽

Validation Methods ◽

Task Forces ◽

Related Information ◽

Ongoing Work

There is an increasing realisation that the quality of the biomacromolecular structures deposited in the Protein Data Bank (PDB) archive needs to be assessed critically using established and powerful validation methods. The Worldwide Protein Data Bank (wwPDB) organization has convened several Validation Task Forces (VTFs) to advise on the methods and standards that should be used to validate all of the entries already in the PDB as well as all structures that will be deposited in the future. The recommendations of the X-ray VTF are currently being implemented in a software pipeline. Here, ongoing work on this pipeline is briefly described as well as ways in which validation-related information could be presented to users of structural data.

Download Full-text