scholarly journals The enzymatic nature of an anonymous protein sequence cannot reliably be inferred from superfamily level structural information alone

2015 ◽  
Vol 24 (5) ◽  
pp. 643-650 ◽  
Author(s):  
Daniel Barry Roche ◽  
Thomas Brüls
IEEE Access ◽  
2021 ◽  
pp. 1-1
Author(s):  
Seonwoo Min ◽  
Seunghyun Park ◽  
Siwon Kim ◽  
Hyun-Soo Choi ◽  
Byunghan Lee ◽  
...  

2019 ◽  
Vol 35 (22) ◽  
pp. 4854-4856 ◽  
Author(s):  
James D Stephenson ◽  
Roman A Laskowski ◽  
Andrew Nightingale ◽  
Matthew E Hurles ◽  
Janet M Thornton

Abstract Motivation Understanding the protein structural context and patterning on proteins of genomic variants can help to separate benign from pathogenic variants and reveal molecular consequences. However, mapping genomic coordinates to protein structures is non-trivial, complicated by alternative splicing and transcript evidence. Results Here we present VarMap, a web tool for mapping a list of chromosome coordinates to canonical UniProt sequences and associated protein 3D structures, including validation checks, and annotating them with structural information. Availability and implementation https://www.ebi.ac.uk/thornton-srv/databases/VarMap. Supplementary information Supplementary data are available at Bioinformatics online.


Pteridines ◽  
2007 ◽  
Vol 18 (1) ◽  
pp. 79-94
Author(s):  
Marco Wiltgen ◽  
Gernot P. Tilz

Abstract Functional specificity of a protein is linked to its structure. A growing section of bioinformatics deals with the prediction and visualization of protein 3D structures. In homology modelling, a protein sequence with an unknown structure is aligned with sequences of known protein structures. By exploiting structural information from the known configurations, the new structure can be predicted. In this introductory paper, we will present the principles of homology modelling and demonstrate the method used, by determining the structure of the enzyme glutamic decarboxylase (GAD 65). This protein is an autoantigen involved in several human autoimmune diseases. We will illustrate the different steps in structure prediction of GAD 65 by use of two experimentally determined structures of pig kidney DOPA decarboxylase (one structure in complex with the inhibitor carbidopa) as templates. The resulting model of GAD 65 provides detailed information about the active site of the protein and selected epitopes. By analysis of the interactions between the DOPA decarboxylase with the inhibitor carbidopa, the residues of the GAD 65 active site can be identified via the sequence alignment between DOPA and GAD 65. The locations of known epitopes in the molecule are visualized in special representations giving insights into mechanisms of antigenicity. Hydrophobicity analysis gives first hints for the adherence ability of GAD 65 to the cell membrane. Homology modelling is at present one of the most efficient techniques to provide accurate structural models of proteins. It is expected that in few years, for every new determined protein sequence, at least one member with a known structure of the same protein family will be available, which will steadily increase the importance and applicability of homology modelling.


Author(s):  
Pamela A Cote-Hammarlof ◽  
Inês Fragata ◽  
Julia Flynn ◽  
David Mavor ◽  
Konstantin B Zeldovich ◽  
...  

Abstract The distribution of fitness effects (DFEs) of new mutations across different environments quantifies the potential for adaptation in a given environment and its cost in others. So far, results regarding the cost of adaptation across environments have been mixed, and most studies have sampled random mutations across different genes. Here, we quantify systematically how costs of adaptation vary along a large stretch of protein sequence by studying the distribution of fitness effects of the same ≈2,300 amino-acid changing mutations obtained from deep mutational scanning of 119 amino acids in the middle domain of the heat shock protein Hsp90 in five environments. This region is known to be important for client binding, stabilization of the Hsp90 dimer, stabilization of the N-terminal-Middle and Middle-C-terminal interdomains, and regulation of ATPase–chaperone activity. Interestingly, we find that fitness correlates well across diverse stressful environments, with the exception of one environment, diamide. Consistent with this result, we find little cost of adaptation; on average only one in seven beneficial mutations is deleterious in another environment. We identify a hotspot of beneficial mutations in a region of the protein that is located within an allosteric center. The identified protein regions that are enriched in beneficial, deleterious, and costly mutations coincide with residues that are involved in the stabilization of Hsp90 interdomains and stabilization of client-binding interfaces, or residues that are involved in ATPase–chaperone activity of Hsp90. Thus, our study yields information regarding the role and adaptive potential of a protein sequence that complements and extends known structural information.


2021 ◽  
Author(s):  
Sanaa Mansoor ◽  
Minkyung Baek ◽  
Umesh Madan ◽  
Eric Horvitz

Protein embeddings learned from aligned sequences have been leveraged in a wide array of tasks in protein understanding and engineering. The sequence embeddings are generated through semi-supervised training on millions of sequences with deep neural models defined with hundreds of millions of parameters, and they continue to increase in performance on target tasks with increasing complexity. We report a more data-efficient approach to encode protein information through joint training on protein sequence and structure in a semi-supervised manner. We show that the method is able to encode both types of information to form a rich embedding space which can be used for downstream prediction tasks. We show that the incorporation of rich structural information into the context under consideration boosts the performance of the model by predicting the effects of single-mutations. We attribute increases in accuracy to the value of leveraging proximity within the enriched representation to identify sequentially and spatially close residues that would be affected by the mutation, using experimentally validated or predicted structures.


2019 ◽  
Vol 20 (22) ◽  
pp. 5640 ◽  
Author(s):  
Fontaine ◽  
Cadet ◽  
Vetrivel

The work aiming to unravel the correlation between protein sequence and function in the absence of structural information can be highly rewarding. We present a new way of considering descriptors from the amino acids index database for modeling and predicting the fitness value of a polypeptide chain. This approach includes the following steps: (i) Calculating Q elementary numerical sequences (Ele_SEQ) depending on the encoding of the amino acid residues, (ii) determining an extended numerical sequence (Ext_SEQ) by concatenating the Q elementary numerical sequences, wherein at least one elementary numerical sequence is a protein spectrum obtained by applying fast Fourier transformation (FFT), and (iii) predicting a value of fitness for polypeptide variants (train and/or validation set). These new descriptors were tested on four sets of proteins of different lengths (GLP-2, TNF alpha, cytochrome P450, and epoxide hydrolase) and activities (cAMP activation, binding affinity, thermostability and enantioselectivity). We show that the use of multiple physicochemical descriptors coupled with the implementation of the FFT, taking into account the interactions between residues of amino amides within the protein sequence, could lead to very significant improvement in the quality of models and predictions. The choice of the descriptor or of the combination of descriptors and/or FFT is dependent on the couple protein/fitness. This approach can provide potential users with value added to existing mutant libraries where screening efforts have so far been unsuccessful in finding improved polypeptide mutants for useful applications.


2019 ◽  
Author(s):  
Sheng Chen ◽  
Zhe Sun ◽  
Zifeng Liu ◽  
Xun Liu ◽  
Yutian Chong ◽  
...  

ABSTRACTProtein sequence profile prediction aims to generate multiple sequences from structural information to advance the protein design. Protein sequence profile can be computationally predicted by energy-based method or fragment-based methods. By integrating these methods with neural networks, our previous method, SPIN2 has achieved a sequence recovery rate of 34%. However, SPIN2 employed only one dimensional (1D) structural properties that are not sufficient to represent 3D structures. In this study, we represented 3D structures by 2D maps of pairwise residue distances. and developed a new method (SPROF) to predict protein sequence profile based on an image captioning learning frame. To our best knowledge, this is the first method to employ 2D distance map for predicting protein properties. SPROF achieved 39.8% in sequence recovery of residues on the independent test set, representing a 5.2% improvement over SPIN2. We also found the sequence recovery increased with the number of their neighbored residues in 3D structural space, indicating that our method can effectively learn long range information from the 2D distance map. Thus, such network architecture using 2D distance map is expected to be useful for other 3D structure-based applications, such as binding site prediction, protein function prediction, and protein interaction prediction.


Sign in / Sign up

Export Citation Format

Share Document