The young personâ€™s guide to the PDB

Wladek Minor; Zbigniew Dauter; Mariusz Jaskolski

doi:10.18388/pb.2016_1

The young personâs guide to the PDB

Postępy Biochemii ◽

10.18388/pb.2016_1 ◽

2016 ◽

Vol 62 (3) ◽

pp. 242-249

Author(s):

Wladek Minor ◽

Zbigniew Dauter ◽

Mariusz Jaskolski

Keyword(s):

Structural Information ◽

Three Dimensional ◽

Data Bank ◽

Research Process ◽

Protein Crystal ◽

X Ray Crystallography ◽

Cryo Electron Microscopy ◽

Three Dimensional Models ◽

Training Skill

The Protein Data Bank (PDB), created in 1971 when merely seven protein crystal structures were known, today holds over 120,000 experimentally-determined three-dimensional models of macromolecules, including gigantic structures comprised of hundreds of thousands of atoms, such as ribosomes and viruses. Most of the deposits come from X-ray crystallography experiments, with important contributions also made by NMR spectroscopy and, recently, by the fast growing Cryo-Electron Microscopy. Although the determination of a macromolecular crystal structure is now facilitated by advanced experimental tools and by sophisticated software, it is still a highly complicated research process requiring specialized training, skill, experience and a bit of luck. Understanding the plethora of structural information provided by the PDB requires that its users (consumers) have at least a rudimentary initiation. This is the purpose of this educational overview.

Download Full-text

Accurate geometrical restraints for Watson–Crick base pairs

Acta Crystallographica Section B Structural Science Crystal Engineering and Materials ◽

10.1107/s2052520619002002 ◽

2019 ◽

Vol 75 (2) ◽

pp. 235-245 ◽

Cited By ~ 3

Author(s):

Miroslaw Gilski ◽

Jianbo Zhao ◽

Marcin Kowiel ◽

Dariusz Brzezinski ◽

Douglas H. Turner ◽

...

Keyword(s):

Structural Information ◽

Data Bank ◽

Base Pairs ◽

Acta Cryst ◽

Ultrahigh Resolution ◽

Cryo Electron Microscopy ◽

Structural Database ◽

Different Sources ◽

Best Parameters

Geometrical restraints provide key structural information for the determination of biomolecular structures at lower resolution by experimental methods such as crystallography or cryo-electron microscopy. In this work, restraint targets for nucleic acids bases are derived from three different sources and compared: small-molecule crystal structures in the Cambridge Structural Database (CSD), ultrahigh-resolution structures in the Protein Data Bank (PDB) and quantum-mechanical (QM) calculations. The best parameters are those based on CSD structures. After over two decades, the standard library of Parkinson et al. [(1996), Acta Cryst. D52, 57–64] is still valid, but improvements are possible with the use of the current CSD database. The CSD-derived geometry is fully compatible with Watson–Crick base pairs, as comparisons with QM results for isolated and paired bases clearly show that the CSD targets closely correspond to proper base pairing. While the QM results are capable of distinguishing between single and paired bases, their level of accuracy is, on average, nearly two times lower than for the CSD-derived targets when gauged by root-mean-square deviations from ultrahigh-resolution structures in the PDB. Nevertheless, the accuracy of QM results appears sufficient to provide stereochemical targets for synthetic base pairs where no reliable experimental structural information is available. To enable future tests for this approach, QM calculations are provided for isocytosine, isoguanine and the iCiG base pair.

Download Full-text

MRPC (Missing Regions in Polypeptide Chains): a knowledgebase

Journal of Applied Crystallography ◽

10.1107/s1600576719012330 ◽

2019 ◽

Vol 52 (6) ◽

pp. 1422-1426

Author(s):

Rajendran Santhosh ◽

Namrata Bankoti ◽

Adgonda Malgonnavar Padmashri ◽

Daliah Michael ◽

Jeyaraman Jeyakanthan ◽

...

Keyword(s):

Protein Structures ◽

Three Dimensional ◽

Protein Molecule ◽

Data Bank ◽

Protein Crystal ◽

Dimensional Structure ◽

Protein Structure Analysis ◽

Three Dimensional Structure ◽

X Ray Crystallography ◽

Polypeptide Chains

Missing regions in protein crystal structures are those regions that cannot be resolved, mainly owing to poor electron density (if the three-dimensional structure was solved using X-ray crystallography). These missing regions are known to have high B factors and could represent loops with a possibility of being part of an active site of the protein molecule. Thus, they are likely to provide valuable information and play a crucial role in the design of inhibitors and drugs and in protein structure analysis. In view of this, an online database, Missing Regions in Polypeptide Chains (MRPC), has been developed which provides information about the missing regions in protein structures available in the Protein Data Bank. In addition, the new database has an option for users to obtain the above data for non-homologous protein structures (25 and 90%). A user-friendly graphical interface with various options has been incorporated, with a provision to view the three-dimensional structure of the protein along with the missing regions using JSmol. The MRPC database is updated regularly (currently once every three months) and can be accessed freely at the URL http://cluster.physics.iisc.ac.in/mrpc.

Download Full-text

Structural Analysis of Proteins on Lipid Substrates

Microscopy and Microanalysis ◽

10.1017/s143192760003840x ◽

2000 ◽

Vol 6 (S2) ◽

pp. 1182-1183

Author(s):

Elizabeth M. Wilson-Kubalek

Keyword(s):

Structural Information ◽

Rapid Determination ◽

3D Structure ◽

Three Dimensional ◽

Molecular Complexes ◽

Structural Data ◽

X Ray Crystallography ◽

Helical Protein ◽

3D Structure Determination

Electron microscopy (EM) has become an increasingly powerful method for the determination of three-dimensional (3D) structures of proteins and macromolecular complexes. EM offers advantages over X-ray crystallography and NMR for obtaining structural information about proteins in physiological conditions, as components of large assemblies, that cannot be obtained in large quantity, or that fail to yield 3D crystals. EM has been used to obtain structural data from images of isolated molecules and molecular complexes, two-dimensional (2D) protein crystals, and helical protein arrays. Helically arranged proteins allow the most rapid determination of 3D maps because they contain a complete range of equally spaced molecular views, therefore no tilting of the sample with respect to the electron beam is required. However, so far 3D structure determination of helical assemblies has been limited to proteins that naturally adopt this organization and to proteins that fortuitously crystallize as helices.

Download Full-text

Is the growth rate of Protein Data Bank sufficient to solve the protein structure prediction problem using template-based modeling?

Bio-Algorithms and Med-Systems ◽

10.1515/bams-2014-0024 ◽

2015 ◽

Vol 11 (1) ◽

pp. 1-7 ◽

Cited By ~ 4

Author(s):

Michal Brylinski

Keyword(s):

Protein Structure ◽

Protein Data Bank ◽

Protein Structure Prediction ◽

Structure Prediction ◽

Structural Information ◽

Three Dimensional ◽

Data Bank ◽

Prediction Problem ◽

Three Dimensional Models ◽

Protein Structure Prediction Problem

AbstractThe Protein Data Bank (PDB) undergoes an exponential expansion in terms of the number of macromolecular structures deposited every year. A pivotal question is how this rapid growth of structural information improves the quality of three-dimensional models constructed by contemporary bioinformatics approaches. To address this problem, we performed a retrospective analysis of the structural coverage of a representative set of proteins using remote homology detected by COMPASS and HHpred. We show that the number of proteins whose structures can be confidently predicted increased during a 9-year period between 2005 and 2014 on account of the PDB growth alone. Nevertheless, this encouraging trend slowed down noticeably around the year 2008 and has yielded insignificant improvements ever since. At the current pace, it is unlikely that the protein structure prediction problem will be solved in the near future using existing template-based modeling techniques. Therefore, further advances in experimental structure determination, qualitatively better approaches in fold recognition, and more accurate template-free structure prediction methods are desperately needed.

Download Full-text

Image Processing and Lattice Determination for Three-Dimensional Nanocrystals

Microscopy and Microanalysis ◽

10.1017/s1431927611012244 ◽

2011 ◽

Vol 17 (6) ◽

pp. 879-885 ◽

Cited By ~ 8

Author(s):

Linhua Jiang ◽

Dilyana Georgieva ◽

Igor Nederlof ◽

Zunfeng Liu ◽

Jan Pieter Abrahams

Keyword(s):

Molecular Structure ◽

Three Dimensional ◽

X Ray ◽

X Ray Crystallography ◽

Hard Materials ◽

Cryo Electron Microscopy ◽

Diffraction Patterns ◽

Radiation Hard ◽

Orientation Parameters

AbstractThree-dimensional nanocrystals can be studied by electron diffraction using transmission cryo-electron microscopy. For molecular structure determination of proteins, such nanosized crystalline samples are out of reach for traditional single-crystal X-ray crystallography. For the study of materials that are not sensitive to the electron beam, software has been developed for determining the crystal lattice and orientation parameters. These methods require radiation-hard materials that survive careful orienting of the crystals and measuring diffraction of one and the same crystal from different, but known directions. However, as such methods can only deal with well-oriented crystalline samples, a problem exists for three-dimensional (3D) crystals of proteins and other radiation sensitive materials that do not survive careful rotational alignment in the electron microscope. Here, we discuss our newly released software AMP that can deal with nonoriented diffraction patterns, and we discuss the progress of our new preprocessing program that uses autocorrelation patterns of diffraction images for lattice determination and indexing of 3D nanocrystals.

Download Full-text

Examining the structure of the mature amyloid fibril

Biochemical Society Transactions ◽

10.1042/bst0300521 ◽

2002 ◽

Vol 30 (4) ◽

pp. 521-525 ◽

Cited By ~ 43

Author(s):

O. S. Makin ◽

L. C. Serpell

Keyword(s):

Self Assembly ◽

Rational Design ◽

Amyloid Fibrils ◽

Structural Information ◽

X Ray ◽

X Ray Crystallography ◽

Cryo Electron Microscopy ◽

Fibre Diffraction ◽

Complementary Techniques ◽

Mature Fibril

The pathogenesis of the group of diseases known collectively as the amyloidoses is characterized by the deposition of insoluble amyloid fibrils. These are straight, unbranching structures about 70–120 å (1 å = 0.1 nm) in diameter and of indeterminate length formed by the self-assembly of a diverse group of normally soluble proteins. Knowledge of the structure of these fibrils is necessary for the understanding of their abnormal assembly and deposition, possibly leading to the rational design of therapeutic agents for their prevention or disaggregation. Structural elucidation is impeded by fibril insolubility and inability to crystallize, thus preventing the use of X-ray crystallography and solution NMR. CD, Fourier-transform infrared spectroscopy and light scattering have been used in the study of the mechanism of fibril formation. This review concentrates on the structural information about the final, mature fibril and in particular the complementary techniques of cryo-electron microscopy, solid-state NMR and X-ray fibre diffraction.

Download Full-text

PDBrenum: a webserver and program providing Protein Data Bank files renumbered according to their UniProt sequences

10.1101/2021.02.14.431128 ◽

2021 ◽

Author(s):

Bulat Faezov ◽

Roland L. Dunbrack

Keyword(s):

Protein Data Bank ◽

Data Bank ◽

Post Translational Modifications ◽

X Ray ◽

X Ray Crystallography ◽

Link Type ◽

Binding Partners ◽

Cryo Electron Microscopy ◽

Comparative Structure ◽

In The Beginning

AbstractThe Protein Data Bank (PDB) was established at Brookhaven National Laboratories in 1971 as an archive for biological macromolecular crystal structures. In the beginning the archive held only seven structures but in early 2021, the database has more than 170,000 structures solved by X-ray crystallography, nuclear magnetic resonance, cryo-electron microscopy, and other methods. Many proteins have been studied under different conditions (e.g., binding partners such as ligands, nucleic acids, or other proteins; mutations and post-translational modifications), thus enabling comparative structure-function studies. However, these studies are made more difficult because authors are allowed by the PDB to number the amino acids in each protein sequence in any manner they wish. This results in the same protein being numbered differently in the available PDB entries. In addition to the coordinates, there are many fields that contain information regarding specific residues in the sequence of each protein in the entry. Here we provide a webserver and Python3 application that fixes the PDB sequence numbering problem by replacing the author numbering with numbering derived from the corresponding UniProt sequences. We obtain this correspondence from the SIFTS database from PDBe. The server and program can take a list of PDB entries and provide renumbered files in mmCIF format and the legacy PDB format for both asymmetric unit files and biological assembly files provided by PDBe. The server can also take a list of UniProt identifiers (“P04637” or “P53_HUMAN”) and return the desired files.AvailabilitySource code is freely available at https://github.com/Faezov/PDBrenum. The webserver is located at: http://dunbrack3.fccc.edu/[email protected] or [email protected].

Download Full-text

Accuracy of the final model

Outline of Crystallography for Biologists ◽

10.1093/oso/9780198510512.003.0018 ◽

2002 ◽

Author(s):

David Blow

Keyword(s):

Data Bank ◽

The Other ◽

Protein Crystal ◽

Major Error ◽

Final Model ◽

Bond Angles ◽

Bond Lengths ◽

R Factor ◽

Insight Into

The result of all the work described in the previous chapters will be a set of coordinates and other data suitable for deposit in the Protein Data Bank. You or I may use these coordinates, and we need to have some insight into their accuracy and reliability. In the previous chapters, indicators have been described, which may suggest aspects of the data or interpretation procedures that might lead to problems. But as the determination of protein crystal structures becomes more routine, many of these indicators are omitted from publications. Fortunately, crystallographic procedures are self-checking to a large extent. It is rare for a major error of interpretation to lead right through to a published refined structure. A high Rfree factor is a warning, especially if coupled with departures from the requirements of correct bond lengths, angles, and acceptable dihedral angles. On the other hand, there will always be a desire to squeeze more results from the data. All interpretations are subject to error; nearly all protein crystals have regions that are less ordered, where accurate interpretation is less feasible; and the structure may be overrefined, using too many variables for the data. If the majority of the molecule is correctly interpreted, a reasonable R factor may be obtained even though some small regions are completely wrong. During refinement it is usual to restrain the bond lengths and bond angles to be near their theoretical values, as described in Chapter 12. The extent to which bond lengths and bond angles depart from these values is often quoted as an indicator of accuracy. These departures are, however, difficult to interpret because they depend on how tightly the restraints have been applied. The same applies to the restraint of certain coordinates to lie in a plane. This difficulty illustrates a general problem. Designers of refinement procedures are understandably anxious to improve their procedures to lead directly to a well-refined structure. Every aspect of structure that can be recognized as having a regularity could, in principle, be expressed as a restraint which enforces it during refinement.

Download Full-text

Using cryo-electron microscopy maps for X-ray structure determination of homologues

Acta Crystallographica Section D Structural Biology ◽

10.1107/s2059798319015924 ◽

2020 ◽

Vol 76 (1) ◽

pp. 63-72

Author(s):

Lingxiao Zeng ◽

Wei Ding ◽

Quan Hao

Keyword(s):

Electron Microscopy ◽

Hybrid Method ◽

Model Building ◽

Test Cases ◽

X Ray ◽

X Ray Crystallography ◽

Sequence Identity ◽

Whole Process ◽

Cryo Electron Microscopy

The combination of cryo-electron microscopy (cryo-EM) and X-ray crystallography reflects an important trend in structural biology. In a previously published study, a hybrid method for the determination of X-ray structures using initial phases provided by the corresponding parts of cryo-EM maps was presented. However, if the target structure of X-ray crystallography is not identical but homologous to the corresponding molecular model of the cryo-EM map, then the decrease in the accuracy of the starting phases makes the whole process more difficult. Here, a modified hybrid method is presented to handle such cases. The whole process includes three steps: cryo-EM map replacement, phase extension by NCS averaging and dual-space iterative model building. When the resolution gap between the cryo-EM and X-ray crystallographic data is large and the sequence identity is low, an intermediate stage of model building is necessary. Six test cases have been studied with sequence identity between the corresponding molecules in the cryo-EM and X-ray structures ranging from 34 to 52% and with sequence similarity ranging from 86 to 91%. This hybrid method consistently produced models with reasonable R work and R free values which agree well with the previously determined X-ray structures for all test cases, thus indicating the general applicability of the method for X-ray structure determination of homologues using cryo-EM maps as a starting point.

Download Full-text

Role of Computational Methods in Going beyond X-ray Crystallography to Explore Protein Structure and Dynamics

International Journal of Molecular Sciences ◽

10.3390/ijms19113401 ◽

2018 ◽

Vol 19 (11) ◽

pp. 3401 ◽

Cited By ~ 16

Author(s):

Ashutosh Srivastava ◽

Tetsuro Nagai ◽

Arpita Srivastava ◽

Osamu Miyashita ◽

Florence Tama

Keyword(s):

Protein Dynamics ◽

Computational Methods ◽

Protein Structures ◽

Three Dimensional ◽

Dimensional Structure ◽

X Ray ◽

X Ray Crystallography ◽

Insight Into

Protein structural biology came a long way since the determination of the first three-dimensional structure of myoglobin about six decades ago. Across this period, X-ray crystallography was the most important experimental method for gaining atomic-resolution insight into protein structures. However, as the role of dynamics gained importance in the function of proteins, the limitations of X-ray crystallography in not being able to capture dynamics came to the forefront. Computational methods proved to be immensely successful in understanding protein dynamics in solution, and they continue to improve in terms of both the scale and the types of systems that can be studied. In this review, we briefly discuss the limitations of X-ray crystallography in studying protein dynamics, and then provide an overview of different computational methods that are instrumental in understanding the dynamics of proteins and biomacromolecular complexes.

Download Full-text

The young personâs guide to the PDB

Accurate geometrical restraints for Watson–Crick base pairs

MRPC (Missing Regions in Polypeptide Chains): a knowledgebase

Structural Analysis of Proteins on Lipid Substrates

Is the growth rate of Protein Data Bank sufficient to solve the protein structure prediction problem using template-based modeling?

Image Processing and Lattice Determination for Three-Dimensional Nanocrystals

Examining the structure of the mature amyloid fibril

PDBrenum: a webserver and program providing Protein Data Bank files renumbered according to their UniProt sequences

Accuracy of the final model

Using cryo-electron microscopy maps for X-ray structure determination of homologues

Role of Computational Methods in Going beyond X-ray Crystallography to Explore Protein Structure and Dynamics

The young personâs guide to the PDB