scholarly journals EC-PSI: Associating Enzyme Commission Numbers with Pfam Domains

2015 ◽  
Author(s):  
Seyed Ziaeddin Alborzi ◽  
Marie-Dominique Devignes ◽  
David W. RITCHIE

Abstract With the growing number of protein structures in the protein data bank (PDB), there is a need to annotate these structures at the domain level in order to relate protein structure to protein function. Thanks to the SIFTS database, many PDB chains are now cross-referenced with Pfam domains and enzyme commission (EC) numbers. However, these annotations do not include any explicit relationship between individual Pfam domains and EC numbers. This article presents a novel statistical training-based method called EC-PSI that can automatically infer high confidence associations between EC numbers and Pfam domains directly from EC-chain associations from SIFTS and from EC-sequence associations from the SwissProt, and TrEMBL databases. By collecting and integrating these existing EC-chain/sequence annotations, our approach is able to infer a total of 8,329 direct EC-Pfam associations with an overall F-measure of 0.819 with respect to the manually curated InterPro database, which we treat here as a “gold standard” reference dataset. Thus, compared to the 1,493 EC-Pfam associations in InterPro, our approach provides a way to find over six times as many high quality EC-Pfam associations completely automatically.

IUCrJ ◽  
2018 ◽  
Vol 5 (5) ◽  
pp. 585-594 ◽  
Author(s):  
Bart van Beusekom ◽  
Krista Joosten ◽  
Maarten L. Hekkelman ◽  
Robbie P. Joosten ◽  
Anastassis Perrakis

Inherent protein flexibility, poor or low-resolution diffraction data or poorly defined electron-density maps often inhibit the building of complete structural models during X-ray structure determination. However, recent advances in crystallographic refinement and model building often allow completion of previously missing parts. This paper presents algorithms that identify regions missing in a certain model but present in homologous structures in the Protein Data Bank (PDB), and `graft' these regions of interest. These new regions are refined and validated in a fully automated procedure. Including these developments in the PDB-REDO pipeline has enabled the building of 24 962 missing loops in the PDB. The models and the automated procedures are publicly available through the PDB-REDO databank and webserver. More complete protein structure models enable a higher quality public archive but also a better understanding of protein function, better comparison between homologous structures and more complete data mining in structural bioinformatics projects.


Author(s):  
Wei Li

Protein is the proteios building block of life. Evolutionarily, its sequence is not as conserved as its structure, making it more reasonable for protein structure, instead of protein sequence, to be the descriptor of protein function. Yet, in the National Center for Biotechnology Information (NCBI) database, the number of experimentally identified protein sequences is in great excess of that of experimentally determined protein structures inside the almost-half-a-century old Protein Data Bank (PDB). For instance, GPR151 is an proton-sensing G-protein coupled receptor (GPCR) originally identified as homologous to galanin receptors. As of March 19, 2020, GPR151’s structure has not been experimentally determined and deposited in PDB yet. Thus, an ab initio modelling approach was employed here to build a three-dimensional structure of GPR151. Overall, the ab initio GPR151 model presented herein constitutes the first structural hypothesis of GPR151 to be experimentally tested in future with previously published, currently ongoing and future GPR151 studies.


2019 ◽  
Author(s):  
Vladimir Gligorijevic ◽  
P. Douglas Renfrew ◽  
Tomasz Kosciolek ◽  
Julia Koehler Leman ◽  
Daniel Berenberg ◽  
...  

The large number of available sequences and the diversity of protein functions challenge current experimental and computational approaches to determining and predicting protein function. We present a deep learning Graph Convolutional Network (GCN) for predicting protein functions and concurrently identifying functionally important residues. This model is initially trained using experimentally determined structures from the Protein Data Bank (PDB) but has significant de-noising capability, with only a minor drop in performance observed when structure predictions are used. We take advantage of this denoising property to train the model on > 200,000 protein structures, including many homology-predicted structures, greatly expanding the reach and applications of the method. Our model learns general structure-function relationships by robustly predicting functions of proteins with ≤ 40% sequence identity to the training set. We show that our GCN architecture predicts functions more accurately than Convolutional Neural Networks trained on sequence data alone and previous competing methods. Using class activation mapping, we automatically identify structural regions at the residue-level that lead to each function prediction for every confidently predicted protein, advancing site-specific function prediction. We use our method to annotate PDB and SWISS-MODEL proteins, making several new confident function predictions spanning both fold and function classifications.


2018 ◽  
Author(s):  
Bart van Beusekom ◽  
Krista Joosten ◽  
Maarten L. Hekkelman ◽  
Robbie P. Joosten ◽  
Anastassis Perrakis

AbstractInherent protein flexibility, poor or low-resolution diffraction data, or poor electron density maps, often inhibit building complete structural models during X-ray structure determination. However, advances in crystallographic refinement and model building nowadays often allow to complete previously missing parts. Here, we present algorithms that identify regions missing in a certain model but present in homologous structures in the Protein Data Bank (PDB), and “graft” these regions of interest. These new regions are refined and validated in a fully automated procedure. Including these developments in our PDB-REDO pipeline, allowed to build 24,962 missing loops in the PDB. The models and the automated procedures are publically available through the PDB-REDO databank and web server (https://pdb-redo.eu). More complete protein structure models enable a higher quality public archive, but also a better understanding of protein function, better comparison between homologous structures, and more complete data mining in structural bioinformatics projects.SynopsisThousands of missing regions in existing protein structure models are completed using new methods based on homology.


2020 ◽  
Vol 36 (10) ◽  
pp. 3064-3071
Author(s):  
Rostislav K Skitchenko ◽  
Dmitrii Usoltsev ◽  
Mayya Uspenskaya ◽  
Andrey V Kajava ◽  
Albert Guskov

Abstract Motivation Halides are negatively charged ions of halogens, forming fluorides (F−), chlorides (Cl−), bromides (Br−) and iodides (I−). These anions are quite reactive and interact both specifically and non-specifically with proteins. Despite their ubiquitous presence and important roles in protein function, little is known about the preferences of halides binding to proteins. To address this problem, we performed the analysis of halide–protein interactions, based on the entries in the Protein Data Bank. Results We have compiled a pipeline for the quick analysis of halide-binding sites in proteins using the available software. Our analysis revealed that all of halides are strongly attracted by the guanidinium moiety of arginine side chains, however, there are also certain preferences among halides for other partners. Furthermore, there is a certain preference for coordination numbers in the binding sites, with a correlation between coordination numbers and amino acid composition. This pipeline can be used as a tool for the analysis of specific halide–protein interactions and assist phasing experiments relying on halides as anomalous scatters. Availability and implementation All data described in this article can be reproduced via complied pipeline published at https://github.com/rostkick/Halide_sites/blob/master/README.md. Supplementary information Supplementary data are available at Bioinformatics online.


2015 ◽  
Vol 27 (3) ◽  
pp. 471 ◽  
Author(s):  
Nahid Khosronezhad ◽  
Abasalt Hosseinzadeh Colagar ◽  
Syed Golam Ali Jorsarayi

The NOP2/Sun domain family, member 7 (Nsun7) gene, which encodes putative methyltransferase Nsun7, has a role in sperm motility in mice. In humans, this gene is located on chromosome 4 with 12 exons. The aim of the present study was to investigate mutations of exon 7 in the normospermic and asthenospermic men. Semen samples were collected from the Fatemezahra IVF centre (Babol, Iran) and analysed on the basis of World Health Organization (WHO) guidelines using general phenol–chloroform DNA extraction methods. Exon 7 was amplified using Sun7-F and Sun7-R primers. Bands on samples from asthenospermic men that exhibited different patterns of movement on single-strand conformation polymorphism gels compared with normal samples were identified and subjected to sequencing for further identification of possible mutations. Direct sequencing of polymerase chain reaction (PCR) products, along with their analysis, confirmed C26232T-transition and T26248G-transversion mutations in asthenospermic men. Comparison of normal and mutant protein structures of Nsun7 indicated that the amino acid serine was converted to alanine, the structure of the helix, coil and strand was changed, and the protein folding and ligand binding sites were changed in samples from asthenospermic men with a transversion mutation in exon 7, indicating impairment of protein function. Because Nsun7 gene products have a role in sperm motility, if an impairment occurs in exon 7 of this gene, it may lead to infertility. The transversion mutation in exon 7 of the Nsun7 gene can be used as an infertility marker in asthenospermic men.


2004 ◽  
Vol 124 (6) ◽  
pp. 679-690 ◽  
Author(s):  
Toby W. Allen ◽  
O.S. Andersen ◽  
Benoit Roux

Proteins, including ion channels, often are described in terms of some average structure and pictured as rigid entities immersed in a featureless solvent continuum. This simplified view, which provides for a convenient representation of the protein's overall structure, incurs the risk of deemphasizing important features underlying protein function, such as thermal fluctuations in the atom positions and the discreteness of the solvent molecules. These factors become particularly important in the case of ion movement through narrow pores, where the magnitude of the thermal fluctuations may be comparable to the ion pore atom separations, such that the strength of the ion channel interactions may vary dramatically as a function of the instantaneous configuration of the ion and the surrounding protein and pore water. Descriptions of ion permeation through narrow pores, which employ static protein structures and a macroscopic continuum dielectric solvent, thus face fundamental difficulties. We illustrate this using simple model calculations based on the gramicidin A and KcsA potassium channels, which show that thermal atomic fluctuations lead to energy profiles that vary by tens of kcal/mol. Consequently, within the framework of a rigid pore model, ion-channel energetics is extremely sensitive to the choice of experimental structure and how the space-dependent dielectric constant is assigned. Given these observations, the significance of any description based on a rigid structure appears limited. Creating a conducting channel model from one single structure requires substantial and arbitrary engineering of the model parameters, making it difficult for such approaches to contribute to our understanding of ion permeation at a microscopic level.


1998 ◽  
Vol 54 (6) ◽  
pp. 1085-1094 ◽  
Author(s):  
Helge Weissig ◽  
Ilya N. Shindyalov ◽  
Philip E. Bourne

Databases containing macromolecular structure data provide a crystallographer with important tools for use in solving, refining and understanding the functional significance of their protein structures. Given this importance, this paper briefly summarizes past progress by outlining the features of the significant number of relevant databases developed to date. One recent database, PDB+, containing all current and obsolete structures deposited with the Protein Data Bank (PDB) is discussed in more detail. PDB+ has been used to analyze the self-consistency of the current (1 January 1998) corpus of over 7000 structures. A summary of those findings is presented (a full discussion will appear elsewhere) in the form of global and temporal trends within the data. These trends indicate that challenges exist if crystallographers are to provide the community with complete and consistent structural results in the future. It is argued that better information management practices are required to meet these challenges.


2018 ◽  
Vol 19 (11) ◽  
pp. 3405 ◽  
Author(s):  
Emanuel Peter ◽  
Jiří Černý

In this article, we present a method for the enhanced molecular dynamics simulation of protein and DNA systems called potential of mean force (PMF)-enriched sampling. The method uses partitions derived from the potentials of mean force, which we determined from DNA and protein structures in the Protein Data Bank (PDB). We define a partition function from a set of PDB-derived PMFs, which efficiently compensates for the error introduced by the assumption of a homogeneous partition function from the PDB datasets. The bias based on the PDB-derived partitions is added in the form of a hybrid Hamiltonian using a renormalization method, which adds the PMF-enriched gradient to the system depending on a linear weighting factor and the underlying force field. We validated the method using simulations of dialanine, the folding of TrpCage, and the conformational sampling of the Dickerson–Drew DNA dodecamer. Our results show the potential for the PMF-enriched simulation technique to enrich the conformational space of biomolecules along their order parameters, while we also observe a considerable speed increase in the sampling by factors ranging from 13.1 to 82. The novel method can effectively be combined with enhanced sampling or coarse-graining methods to enrich conformational sampling with a partition derived from the PDB.


2014 ◽  
Vol 70 (a1) ◽  
pp. C491-C491
Author(s):  
Jürgen Haas ◽  
Alessandro Barbato ◽  
Tobias Schmidt ◽  
Steven Roth ◽  
Andrew Waterhouse ◽  
...  

Computational modeling and prediction of three-dimensional macromolecular structures and complexes from their sequence has been a long standing goal in structural biology. Over the last two decades, a paradigm shift has occurred: starting from a large "knowledge gap" between the huge number of protein sequences compared to a small number of experimentally known structures, today, some form of structural information – either experimental or computational – is available for the majority of amino acids encoded by common model organism genomes. Methods for structure modeling and prediction have made substantial progress of the last decades, and template based homology modeling techniques have matured to a point where they are now routinely used to complement experimental techniques. However, computational modeling and prediction techniques often fall short in accuracy compared to high-resolution experimental structures, and it is often difficult to convey the expected accuracy and structural variability of a specific model. Retrospectively assessing the quality of blind structure prediction in comparison to experimental reference structures allows benchmarking the state-of-the-art in structure prediction and identifying areas which need further development. The Critical Assessment of Structure Prediction (CASP) experiment has for the last 20 years assessed the progress in the field of protein structure modeling based on predictions for ca. 100 blind prediction targets per experiment which are carefully evaluated by human experts. The "Continuous Model EvaluatiOn" (CAMEO) project aims to provide a fully automated blind assessment for prediction servers based on weekly pre-released sequences of the Protein Data Bank PDB. CAMEO has been made possible by the development of novel scoring methods such as lDDT, which are robust against domain movements to allow for automated continuous structure comparison without human intervention.


Sign in / Sign up

Export Citation Format

Share Document