scholarly journals The young person’s guide to the PDB

2016 ◽  
Vol 62 (3) ◽  
pp. 242-249
Author(s):  
Wladek Minor ◽  
Zbigniew Dauter ◽  
Mariusz Jaskolski

The Protein Data Bank (PDB), created in 1971 when merely seven protein crystal structures were known, today holds over 120,000 experimentally-determined three-dimensional models of macromolecules, including gigantic structures comprised of hundreds of thousands of atoms, such as ribosomes and viruses. Most of the deposits come from X-ray crystallography experiments, with important contributions also made by NMR spectroscopy and, recently, by the fast growing Cryo-Electron Microscopy. Although the determination of a macromolecular crystal structure is now facilitated by advanced experimental tools and by sophisticated software, it is still a highly complicated research process requiring specialized training, skill, experience and a bit of luck. Understanding the plethora of structural information provided by the PDB requires that its users (consumers) have at least a rudimentary initiation. This is the purpose of this educational overview.

Author(s):  
Miroslaw Gilski ◽  
Jianbo Zhao ◽  
Marcin Kowiel ◽  
Dariusz Brzezinski ◽  
Douglas H. Turner ◽  
...  

Geometrical restraints provide key structural information for the determination of biomolecular structures at lower resolution by experimental methods such as crystallography or cryo-electron microscopy. In this work, restraint targets for nucleic acids bases are derived from three different sources and compared: small-molecule crystal structures in the Cambridge Structural Database (CSD), ultrahigh-resolution structures in the Protein Data Bank (PDB) and quantum-mechanical (QM) calculations. The best parameters are those based on CSD structures. After over two decades, the standard library of Parkinson et al. [(1996), Acta Cryst. D52, 57–64] is still valid, but improvements are possible with the use of the current CSD database. The CSD-derived geometry is fully compatible with Watson–Crick base pairs, as comparisons with QM results for isolated and paired bases clearly show that the CSD targets closely correspond to proper base pairing. While the QM results are capable of distinguishing between single and paired bases, their level of accuracy is, on average, nearly two times lower than for the CSD-derived targets when gauged by root-mean-square deviations from ultrahigh-resolution structures in the PDB. Nevertheless, the accuracy of QM results appears sufficient to provide stereochemical targets for synthetic base pairs where no reliable experimental structural information is available. To enable future tests for this approach, QM calculations are provided for isocytosine, isoguanine and the iCiG base pair.


2019 ◽  
Vol 52 (6) ◽  
pp. 1422-1426
Author(s):  
Rajendran Santhosh ◽  
Namrata Bankoti ◽  
Adgonda Malgonnavar Padmashri ◽  
Daliah Michael ◽  
Jeyaraman Jeyakanthan ◽  
...  

Missing regions in protein crystal structures are those regions that cannot be resolved, mainly owing to poor electron density (if the three-dimensional structure was solved using X-ray crystallography). These missing regions are known to have high B factors and could represent loops with a possibility of being part of an active site of the protein molecule. Thus, they are likely to provide valuable information and play a crucial role in the design of inhibitors and drugs and in protein structure analysis. In view of this, an online database, Missing Regions in Polypeptide Chains (MRPC), has been developed which provides information about the missing regions in protein structures available in the Protein Data Bank. In addition, the new database has an option for users to obtain the above data for non-homologous protein structures (25 and 90%). A user-friendly graphical interface with various options has been incorporated, with a provision to view the three-dimensional structure of the protein along with the missing regions using JSmol. The MRPC database is updated regularly (currently once every three months) and can be accessed freely at the URL http://cluster.physics.iisc.ac.in/mrpc.


2000 ◽  
Vol 6 (S2) ◽  
pp. 1182-1183
Author(s):  
Elizabeth M. Wilson-Kubalek

Electron microscopy (EM) has become an increasingly powerful method for the determination of three-dimensional (3D) structures of proteins and macromolecular complexes. EM offers advantages over X-ray crystallography and NMR for obtaining structural information about proteins in physiological conditions, as components of large assemblies, that cannot be obtained in large quantity, or that fail to yield 3D crystals. EM has been used to obtain structural data from images of isolated molecules and molecular complexes, two-dimensional (2D) protein crystals, and helical protein arrays. Helically arranged proteins allow the most rapid determination of 3D maps because they contain a complete range of equally spaced molecular views, therefore no tilting of the sample with respect to the electron beam is required. However, so far 3D structure determination of helical assemblies has been limited to proteins that naturally adopt this organization and to proteins that fortuitously crystallize as helices.


2015 ◽  
Vol 11 (1) ◽  
pp. 1-7 ◽  
Author(s):  
Michal Brylinski

AbstractThe Protein Data Bank (PDB) undergoes an exponential expansion in terms of the number of macromolecular structures deposited every year. A pivotal question is how this rapid growth of structural information improves the quality of three-dimensional models constructed by contemporary bioinformatics approaches. To address this problem, we performed a retrospective analysis of the structural coverage of a representative set of proteins using remote homology detected by COMPASS and HHpred. We show that the number of proteins whose structures can be confidently predicted increased during a 9-year period between 2005 and 2014 on account of the PDB growth alone. Nevertheless, this encouraging trend slowed down noticeably around the year 2008 and has yielded insignificant improvements ever since. At the current pace, it is unlikely that the protein structure prediction problem will be solved in the near future using existing template-based modeling techniques. Therefore, further advances in experimental structure determination, qualitatively better approaches in fold recognition, and more accurate template-free structure prediction methods are desperately needed.


2011 ◽  
Vol 17 (6) ◽  
pp. 879-885 ◽  
Author(s):  
Linhua Jiang ◽  
Dilyana Georgieva ◽  
Igor Nederlof ◽  
Zunfeng Liu ◽  
Jan Pieter Abrahams

AbstractThree-dimensional nanocrystals can be studied by electron diffraction using transmission cryo-electron microscopy. For molecular structure determination of proteins, such nanosized crystalline samples are out of reach for traditional single-crystal X-ray crystallography. For the study of materials that are not sensitive to the electron beam, software has been developed for determining the crystal lattice and orientation parameters. These methods require radiation-hard materials that survive careful orienting of the crystals and measuring diffraction of one and the same crystal from different, but known directions. However, as such methods can only deal with well-oriented crystalline samples, a problem exists for three-dimensional (3D) crystals of proteins and other radiation sensitive materials that do not survive careful rotational alignment in the electron microscope. Here, we discuss our newly released software AMP that can deal with nonoriented diffraction patterns, and we discuss the progress of our new preprocessing program that uses autocorrelation patterns of diffraction images for lattice determination and indexing of 3D nanocrystals.


2002 ◽  
Vol 30 (4) ◽  
pp. 521-525 ◽  
Author(s):  
O. S. Makin ◽  
L. C. Serpell

The pathogenesis of the group of diseases known collectively as the amyloidoses is characterized by the deposition of insoluble amyloid fibrils. These are straight, unbranching structures about 70–120 å (1 å = 0.1 nm) in diameter and of indeterminate length formed by the self-assembly of a diverse group of normally soluble proteins. Knowledge of the structure of these fibrils is necessary for the understanding of their abnormal assembly and deposition, possibly leading to the rational design of therapeutic agents for their prevention or disaggregation. Structural elucidation is impeded by fibril insolubility and inability to crystallize, thus preventing the use of X-ray crystallography and solution NMR. CD, Fourier-transform infrared spectroscopy and light scattering have been used in the study of the mechanism of fibril formation. This review concentrates on the structural information about the final, mature fibril and in particular the complementary techniques of cryo-electron microscopy, solid-state NMR and X-ray fibre diffraction.


2021 ◽  
Author(s):  
Bulat Faezov ◽  
Roland L. Dunbrack

AbstractThe Protein Data Bank (PDB) was established at Brookhaven National Laboratories in 1971 as an archive for biological macromolecular crystal structures. In the beginning the archive held only seven structures but in early 2021, the database has more than 170,000 structures solved by X-ray crystallography, nuclear magnetic resonance, cryo-electron microscopy, and other methods. Many proteins have been studied under different conditions (e.g., binding partners such as ligands, nucleic acids, or other proteins; mutations and post-translational modifications), thus enabling comparative structure-function studies. However, these studies are made more difficult because authors are allowed by the PDB to number the amino acids in each protein sequence in any manner they wish. This results in the same protein being numbered differently in the available PDB entries. In addition to the coordinates, there are many fields that contain information regarding specific residues in the sequence of each protein in the entry. Here we provide a webserver and Python3 application that fixes the PDB sequence numbering problem by replacing the author numbering with numbering derived from the corresponding UniProt sequences. We obtain this correspondence from the SIFTS database from PDBe. The server and program can take a list of PDB entries and provide renumbered files in mmCIF format and the legacy PDB format for both asymmetric unit files and biological assembly files provided by PDBe. The server can also take a list of UniProt identifiers (“P04637” or “P53_HUMAN”) and return the desired files.AvailabilitySource code is freely available at https://github.com/Faezov/PDBrenum. The webserver is located at: http://dunbrack3.fccc.edu/[email protected] or [email protected].


Author(s):  
David Blow

The result of all the work described in the previous chapters will be a set of coordinates and other data suitable for deposit in the Protein Data Bank. You or I may use these coordinates, and we need to have some insight into their accuracy and reliability. In the previous chapters, indicators have been described, which may suggest aspects of the data or interpretation procedures that might lead to problems. But as the determination of protein crystal structures becomes more routine, many of these indicators are omitted from publications. Fortunately, crystallographic procedures are self-checking to a large extent. It is rare for a major error of interpretation to lead right through to a published refined structure. A high Rfree factor is a warning, especially if coupled with departures from the requirements of correct bond lengths, angles, and acceptable dihedral angles. On the other hand, there will always be a desire to squeeze more results from the data. All interpretations are subject to error; nearly all protein crystals have regions that are less ordered, where accurate interpretation is less feasible; and the structure may be overrefined, using too many variables for the data. If the majority of the molecule is correctly interpreted, a reasonable R factor may be obtained even though some small regions are completely wrong. During refinement it is usual to restrain the bond lengths and bond angles to be near their theoretical values, as described in Chapter 12. The extent to which bond lengths and bond angles depart from these values is often quoted as an indicator of accuracy. These departures are, however, difficult to interpret because they depend on how tightly the restraints have been applied. The same applies to the restraint of certain coordinates to lie in a plane. This difficulty illustrates a general problem. Designers of refinement procedures are understandably anxious to improve their procedures to lead directly to a well-refined structure. Every aspect of structure that can be recognized as having a regularity could, in principle, be expressed as a restraint which enforces it during refinement.


2020 ◽  
Vol 76 (1) ◽  
pp. 63-72
Author(s):  
Lingxiao Zeng ◽  
Wei Ding ◽  
Quan Hao

The combination of cryo-electron microscopy (cryo-EM) and X-ray crystallography reflects an important trend in structural biology. In a previously published study, a hybrid method for the determination of X-ray structures using initial phases provided by the corresponding parts of cryo-EM maps was presented. However, if the target structure of X-ray crystallography is not identical but homologous to the corresponding molecular model of the cryo-EM map, then the decrease in the accuracy of the starting phases makes the whole process more difficult. Here, a modified hybrid method is presented to handle such cases. The whole process includes three steps: cryo-EM map replacement, phase extension by NCS averaging and dual-space iterative model building. When the resolution gap between the cryo-EM and X-ray crystallographic data is large and the sequence identity is low, an intermediate stage of model building is necessary. Six test cases have been studied with sequence identity between the corresponding molecules in the cryo-EM and X-ray structures ranging from 34 to 52% and with sequence similarity ranging from 86 to 91%. This hybrid method consistently produced models with reasonable R work and R free values which agree well with the previously determined X-ray structures for all test cases, thus indicating the general applicability of the method for X-ray structure determination of homologues using cryo-EM maps as a starting point.


2018 ◽  
Vol 19 (11) ◽  
pp. 3401 ◽  
Author(s):  
Ashutosh Srivastava ◽  
Tetsuro Nagai ◽  
Arpita Srivastava ◽  
Osamu Miyashita ◽  
Florence Tama

Protein structural biology came a long way since the determination of the first three-dimensional structure of myoglobin about six decades ago. Across this period, X-ray crystallography was the most important experimental method for gaining atomic-resolution insight into protein structures. However, as the role of dynamics gained importance in the function of proteins, the limitations of X-ray crystallography in not being able to capture dynamics came to the forefront. Computational methods proved to be immensely successful in understanding protein dynamics in solution, and they continue to improve in terms of both the scale and the types of systems that can be studied. In this review, we briefly discuss the limitations of X-ray crystallography in studying protein dynamics, and then provide an overview of different computational methods that are instrumental in understanding the dynamics of proteins and biomacromolecular complexes.


Sign in / Sign up

Export Citation Format

Share Document