RCSB Protein Data Bank 1D tools and services

Bioinformatics ◽

10.1093/bioinformatics/btaa1012 ◽

2020 ◽

Author(s):

Joan Segura ◽

Yana Rose ◽

John Westbrook ◽

Stephen K Burley ◽

Jose M Duarte

Keyword(s):

Protein Data Bank ◽

Structural Data ◽

Data Bank ◽

Complete Picture ◽

Supplementary Information ◽

Supplementary Data ◽

Interactive Tool ◽

The Web

Abstract Motivation Interoperability between polymer sequences and structural data is essential for providing a complete picture of protein and gene features and helping to understand biomolecular function. Results Herein, we present two resources designed to improve interoperability between the RCSB Protein Data Bank, the NCBI and the UniProtKB data resources and visualize integrated data therefrom. The underlying tools provide a flexible means of mapping between the different coordinate spaces and an interactive tool allows convenient visualization of the 1-dimensional data over the web. Availabilityand implementation https://1d-coordinates.rcsb.org and https://rcsb.github.io/rcsb-saguaro. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

GeoMine: interactive pattern mining of protein–ligand interfaces in the Protein Data Bank

Bioinformatics ◽

10.1093/bioinformatics/btaa693 ◽

2020 ◽

Author(s):

Konrad Diedrich ◽

Joel Graef ◽

Katrin Schöning-Stierand ◽

Matthias Rarey

Keyword(s):

Protein Data Bank ◽

Web Application ◽

Pattern Mining ◽

Structural Data ◽

Data Bank ◽

Supplementary Information ◽

User Friendliness ◽

Iterative Search ◽

Potential Applications ◽

Query Generation

Abstract Summary The searching of user-defined 3D queries in molecular interfaces is a computationally challenging problem that is not satisfactorily solved so far. Most of the few existing tools focused on that purpose are desktop based and not openly available. Besides that, they show a lack of query versatility, search efficiency and user-friendliness. We address this issue with GeoMine, a publicly available web application that provides textual, numerical and geometrical search functionality for protein–ligand binding sites derived from structural data contained in the Protein Data Bank (PDB). The query generation is supported by a 3D representation of a start structure that provides interactively selectable elements like atoms, bonds and interactions. GeoMine gives full control over geometric variability in the query while performing a deterministic, precise search. Reasonably selective queries are processed on the entire set of protein–ligand complexes in the PDB within a few minutes. GeoMine offers an interactive and iterative search process of successive result analyses and query adaptations. From the numerous potential applications, we picked two from the field of side-effect analyze showcasing the usefulness of GeoMine. Availability and implementation GeoMine is part of the ProteinsPlus web application suite and freely available at https://proteins.plus. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

atomium—a Python structure parser

Bioinformatics ◽

10.1093/bioinformatics/btaa072 ◽

2020 ◽

Vol 36 (9) ◽

pp. 2750-2754 ◽

Cited By ~ 1

Author(s):

Sam M Ireland ◽

Andrew C R Martin

Keyword(s):

Web Service ◽

Protein Data Bank ◽

Structural Biology ◽

Data Bank ◽

Supplementary Information ◽

Supplementary Data ◽

File Formats

Abstract Summary Structural biology relies on specific file formats to convey information about macromolecular structures. Traditionally this has been the PDB format, but increasingly newer formats, such as PDBML, mmCIF and MMTF are being used. Here we present atomium, a modern, lightweight, Python library for parsing, manipulating and saving PDB, mmCIF and MMTF file formats. In addition, we provide a web service, pdb2json, which uses atomium to give a consistent JSON representation to the entire Protein Data Bank. Availability and implementation atomium is implemented in Python and its performance is equivalent to the existing library BioPython. However, it has significant advantages in features and API design. atomium is available from atomium.bioinf.org.uk and pdb2json can be accessed at pdb2json.bioinf.org.uk Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Finding enzyme cofactors in Protein Data Bank

Bioinformatics ◽

10.1093/bioinformatics/btz115 ◽

2019 ◽

Vol 35 (18) ◽

pp. 3510-3511 ◽

Cited By ~ 3

Author(s):

Abhik Mukhopadhyay ◽

Neera Borkakoti ◽

Lukáš Pravda ◽

Jonathan D Tyzack ◽

Janet M Thornton ◽

...

Keyword(s):

Small Molecules ◽

Protein Data Bank ◽

Data Bank ◽

Supplementary Information ◽

Supplementary Data ◽

Release Process ◽

Enzyme Reactions ◽

Enzyme Cofactors ◽

Rest Api ◽

The Individual

Abstract Motivation Cofactors are essential for many enzyme reactions. The Protein Data Bank (PDB) contains >67 000 entries containing enzyme structures, many with bound cofactor or cofactor-like molecules. This work aims to identify and categorize these small molecules in the PDB and make it easier to find them. Results The Protein Data Bank in Europe (PDBe; pdbe.org) has implemented a pipeline to identify enzyme cofactor and cofactor-like molecules, which are now part of the PDBe weekly release process. Availability and implementation Information is made available on the individual PDBe entry pages at pdbe.org and programmatically through the PDBe REST API (pdbe.org/api). Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Analyzing Motion Properties of Proteins Affected by Localized Structures From a Robot Kinematics Perspective

Volume 5A: 39th Mechanisms and Robotics Conference ◽

10.1115/detc2015-47010 ◽

2015 ◽

Author(s):

Keisuke Arikawa

Keyword(s):

Protein Data Bank ◽

Complex Shape ◽

Structural Data ◽

Data Bank ◽

Robot Kinematics ◽

Motion Prediction ◽

Serial Manipulators ◽

Localized Structures ◽

Motion Modes ◽

Structural Compliance

On the basis of robot kinematics, we have thus far developed a method for predicting the motion of proteins from their 3D structural data given in the Protein Data Bank (PDB data). In this method, proteins are modeled as serial manipulators constrained by springs and the structural compliance properties of the models are evaluated. We focus on localized instead of whole structures of proteins. Employing the same model used in our method of motion prediction, the motion properties of the localized structures and the relation between the motion properties of localized and whole structures are analyzed. First, we present a method for graphically expressing the deformation of objects with a complex shape, such as proteins, by approximating the shape as a rectangular prism with a mesh on its surface. We then formulate a method for comparing the motion properties of localized structures cleaved from the whole structure and those remaining in it by expressing the motion of the latter using the decomposed motion modes of the former according to the structural compliance. Finally, we show a method for evaluating the effect of a localized structure on the motion properties of proteins by applying forces to localized structures. In the formulations, we demonstrate applications as illustrative examples using the PDB data of a real protein.

Download Full-text

Protein Data Bank Japan (PDBj): maintaining a structural data archive and resource description framework format

Nucleic Acids Research ◽

10.1093/nar/gkr811 ◽

2011 ◽

Vol 40 (D1) ◽

pp. D453-D460 ◽

Cited By ~ 88

Author(s):

A. R. Kinjo ◽

H. Suzuki ◽

R. Yamashita ◽

Y. Ikegawa ◽

T. Kudou ◽

...

Keyword(s):

Protein Data Bank ◽

Resource Description Framework ◽

Structural Data ◽

Data Bank ◽

Data Archive ◽

Description Framework ◽

Resource Description

Download Full-text

QPARSE: searching for long-looped or multimeric G-quadruplexes potentially distinctive and druggable

Bioinformatics ◽

10.1093/bioinformatics/btz569 ◽

2019 ◽

Cited By ~ 1

Author(s):

Michele Berselli ◽

Enrico Lavezzo ◽

Stefano Toppo

Keyword(s):

Human Gene ◽

State Of The Art ◽

Comprehensive Analysis ◽

Supplementary Information ◽

Gene Promoters ◽

Supplementary Data ◽

Stem Loop ◽

Hiv 1 ◽

Rna And Dna ◽

The Web

Abstract Motivation G-quadruplexes (G4s) are non-canonical nucleic acid conformations that are widespread in all kingdoms of life and are emerging as important regulators both in RNA and DNA. Recently, two new higher-order architectures have been reported: adjacent interacting G4s, and G4s with stable long loops forming stem-loop structures. As there are no specialized tools to identify these conformations, we developed QPARSE. Results QPARSE can exhaustively search for degenerate potential quadruplex-forming sequences (PQSs) containing bulges and/or mismatches at genomic level, as well as either multimeric or long-looped PQS (MPQS and LLPQS respectively). While its assessment vs. known reference datasets is comparable with the state-of-the-art, what is more interesting is its performance in the identification of MPQS and LLPQS that present algorithms are not designed to search for. We report a comprehensive analysis of MPQS in human gene promoters and the analysis of LLPQS on three experimentally validated case studies from HIV-1, BCL2, and hTERT. Availability QPARSE is freely accessible on the web at http://www.medcomp.medicina.unipd.it/qparse/index or downloadable from github as a python 2.7 program https://github.com/B3rse/qparse Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Lemon: a framework for rapidly mining structural information from the Protein Data Bank

Bioinformatics ◽

10.1093/bioinformatics/btz178 ◽

2019 ◽

Vol 35 (20) ◽

pp. 4165-4167 ◽

Cited By ~ 1

Author(s):

Jonathan Fine ◽

Gaurav Chopra

Keyword(s):

Protein Data Bank ◽

Structural Information ◽

Computational Cost ◽

Data Bank ◽

Structural Features ◽

Supplementary Information ◽

Develop Software ◽

Reading Text ◽

Essential Resource ◽

3D Descriptors

Abstract Motivation The Protein Data Bank (PDB) currently holds over 140 000 biomolecular structures and continues to release new structures on a weekly basis. The PDB is an essential resource to the structural bioinformatics community to develop software that mine, use, categorize and analyze such data. New computational biology methods are evaluated using custom benchmarking sets derived as subsets of 3D experimentally determined structures and structural features from the PDB. Currently, such benchmarking features are manually curated with custom scripts in a non-standardized manner that results in slow distribution and updates with new experimental structures. Finally, there is a scarcity of standardized tools to rapidly query 3D descriptors of the entire PDB. Results Our solution is the Lemon framework, a C++11 library with Python bindings, which provides a consistent workflow methodology for selecting biomolecular interactions based on user criterion and computing desired 3D structural features. This framework can parse and characterize the entire PDB in <10 min on modern, multithreaded hardware. The speed in parsing is obtained by using the recently developed MacroMolecule Transmission Format to reduce the computational cost of reading text-based PDB files. The use of C++ lambda functions and Python bindings provide extensive flexibility for analysis and categorization of the PDB by allowing the user to write custom functions to suite their objective. We think Lemon will become a one-stop-shop to quickly mine the entire PDB to generate desired structural biology features. Availability and implementation The Lemon software is available as a C++ header library along with a PyPI package and example functions at https://github.com/chopralab/lemon. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

PDBMine: A Reformulation of the Protein Data Bank to Facilitate Structural Data Mining

2019 International Conference on Computational Science and Computational Intelligence (CSCI) ◽

10.1109/csci49370.2019.00272 ◽

2019 ◽

Cited By ~ 1

Author(s):

Casey Cole ◽

Christopher Ott ◽

Diego Valdes ◽

Homayoun Valafar

Keyword(s):

Data Mining ◽

Protein Data Bank ◽

Structural Data ◽

Data Bank

Download Full-text

PDBeCIF: an open-source mmCIF/CIF parsing and processing package

BMC Bioinformatics ◽

10.1186/s12859-021-04271-9 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Glen van Ginkel ◽

Lukáš Pravda ◽

José M. Dana ◽

Mihaly Varadi ◽

Peter Keller ◽

...

Keyword(s):

Open Source ◽

Protein Data Bank ◽

Hybrid Methods ◽

Structural Data ◽

Data Bank ◽

File Format ◽

Macromolecular Complexes ◽

Link Type ◽

Information File ◽

Multiscale Structures

Abstract Background Biomacromolecular structural data outgrew the legacy Protein Data Bank (PDB) format which the scientific community relied on for decades, yet the use of its successor PDBx/Macromolecular Crystallographic Information File format (PDBx/mmCIF) is still not widespread. Perhaps one of the reasons is the availability of easy to use tools that only support the legacy format, but also the inherent difficulties of processing mmCIF files correctly, given the number of edge cases that make efficient parsing problematic. Nevertheless, to fully exploit macromolecular structure data and their associated annotations such as multiscale structures from integrative/hybrid methods or large macromolecular complexes determined using traditional methods, it is necessary to fully adopt the new format as soon as possible. Results To this end, we developed PDBeCIF, an open-source Python project for manipulating mmCIF and CIF files. It is part of the official list of mmCIF parsers recorded by the wwPDB and is heavily employed in the processes of the Protein Data Bank in Europe. The package is freely available both from the PyPI repository (http://pypi.org/project/pdbecif) and from GitHub (https://github.com/pdbeurope/pdbecif) along with rich documentation and many ready-to-use examples. Conclusions PDBeCIF is an efficient and lightweight Python 2.6+/3+ package with no external dependencies. It can be readily integrated with 3rd party libraries as well as adopted for broad scientific analyses.

Download Full-text

Implementing an X-ray validation pipeline for the Protein Data Bank

Acta Crystallographica Section D Biological Crystallography ◽

10.1107/s0907444911050359 ◽

2012 ◽

Vol 68 (4) ◽

pp. 478-483 ◽

Cited By ~ 60

Author(s):

Swanand Gore ◽

Sameer Velankar ◽

Gerard J. Kleywegt

Keyword(s):

Protein Data Bank ◽

Structural Data ◽

Data Bank ◽

Software Pipeline ◽

X Ray ◽

Validation Methods ◽

Task Forces ◽

Related Information ◽

Ongoing Work

There is an increasing realisation that the quality of the biomacromolecular structures deposited in the Protein Data Bank (PDB) archive needs to be assessed critically using established and powerful validation methods. The Worldwide Protein Data Bank (wwPDB) organization has convened several Validation Task Forces (VTFs) to advise on the methods and standards that should be used to validate all of the entries already in the PDB as well as all structures that will be deposited in the future. The recommendations of the X-ray VTF are currently being implemented in a software pipeline. Here, ongoing work on this pipeline is briefly described as well as ways in which validation-related information could be presented to users of structural data.

Download Full-text