The enzymatic nature of an anonymous protein sequence cannot reliably be inferred from superfamily level structural information alone

Daniel Barry Roche; Thomas Brüls

doi:10.1002/pro.2635

Pre-training of Deep Bidirectional Protein Sequence Representations with Structural Information

IEEE Access ◽

10.1109/access.2021.3110269 ◽

2021 ◽

pp. 1-1

Author(s):

Seonwoo Min ◽

Seunghyun Park ◽

Siwon Kim ◽

Hyun-Soo Choi ◽

Byunghan Lee ◽

...

Keyword(s):

Protein Sequence ◽

Structural Information

Download Full-text

VarMap: a web tool for mapping genomic coordinates to protein sequence and structure and retrieving protein structural annotations

Bioinformatics ◽

10.1093/bioinformatics/btz482 ◽

2019 ◽

Vol 35 (22) ◽

pp. 4854-4856 ◽

Cited By ~ 8

Author(s):

James D Stephenson ◽

Roman A Laskowski ◽

Andrew Nightingale ◽

Matthew E Hurles ◽

Janet M Thornton

Keyword(s):

Protein Sequence ◽

Structural Information ◽

Protein Structures ◽

Supplementary Information ◽

Supplementary Data ◽

Web Tool ◽

Genomic Variants ◽

Structural Context ◽

Pathogenic Variants ◽

Transcript Evidence

Abstract Motivation Understanding the protein structural context and patterning on proteins of genomic variants can help to separate benign from pathogenic variants and reveal molecular consequences. However, mapping genomic coordinates to protein structures is non-trivial, complicated by alternative splicing and transcript evidence. Results Here we present VarMap, a web tool for mapping a list of chromosome coordinates to canonical UniProt sequences and associated protein 3D structures, including validation checks, and annotating them with structural information. Availability and implementation https://www.ebi.ac.uk/thornton-srv/databases/VarMap. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

A Basic Molecular Analysis of the Diabetic Antigen GAD by Homology Modelling. Principles of the Method and Understanding of Antigenicity and Binding Sites

Pteridines ◽

10.1515/pteridines.2007.18.1.79 ◽

2007 ◽

Vol 18 (1) ◽

pp. 79-94

Author(s):

Marco Wiltgen ◽

Gernot P. Tilz

Keyword(s):

Active Site ◽

Protein Sequence ◽

Structure Prediction ◽

Structural Information ◽

Homology Modelling ◽

Protein Structures ◽

Dopa Decarboxylase ◽

Gad 65 ◽

Unknown Structure ◽

Introductory Paper

Abstract Functional specificity of a protein is linked to its structure. A growing section of bioinformatics deals with the prediction and visualization of protein 3D structures. In homology modelling, a protein sequence with an unknown structure is aligned with sequences of known protein structures. By exploiting structural information from the known configurations, the new structure can be predicted. In this introductory paper, we will present the principles of homology modelling and demonstrate the method used, by determining the structure of the enzyme glutamic decarboxylase (GAD 65). This protein is an autoantigen involved in several human autoimmune diseases. We will illustrate the different steps in structure prediction of GAD 65 by use of two experimentally determined structures of pig kidney DOPA decarboxylase (one structure in complex with the inhibitor carbidopa) as templates. The resulting model of GAD 65 provides detailed information about the active site of the protein and selected epitopes. By analysis of the interactions between the DOPA decarboxylase with the inhibitor carbidopa, the residues of the GAD 65 active site can be identified via the sequence alignment between DOPA and GAD 65. The locations of known epitopes in the molecule are visualized in special representations giving insights into mechanisms of antigenicity. Hydrophobicity analysis gives first hints for the adherence ability of GAD 65 to the cell membrane. Homology modelling is at present one of the most efficient techniques to provide accurate structural models of proteins. It is expected that in few years, for every new determined protein sequence, at least one member with a known structure of the same protein family will be available, which will steadily increase the importance and applicability of homology modelling.

Download Full-text

The Adaptive Potential of the Middle Domain of Yeast Hsp90

Molecular Biology and Evolution ◽

10.1093/molbev/msaa211 ◽

2020 ◽

Author(s):

Pamela A Cote-Hammarlof ◽

Inês Fragata ◽

Julia Flynn ◽

David Mavor ◽

Konstantin B Zeldovich ◽

...

Keyword(s):

Protein Sequence ◽

Structural Information ◽

Chaperone Activity ◽

Adaptive Potential ◽

Beneficial Mutations ◽

Fitness Effects ◽

Middle Domain ◽

Large Stretch ◽

The Cost ◽

Binding Interfaces

Abstract The distribution of fitness effects (DFEs) of new mutations across different environments quantifies the potential for adaptation in a given environment and its cost in others. So far, results regarding the cost of adaptation across environments have been mixed, and most studies have sampled random mutations across different genes. Here, we quantify systematically how costs of adaptation vary along a large stretch of protein sequence by studying the distribution of fitness effects of the same ≈2,300 amino-acid changing mutations obtained from deep mutational scanning of 119 amino acids in the middle domain of the heat shock protein Hsp90 in five environments. This region is known to be important for client binding, stabilization of the Hsp90 dimer, stabilization of the N-terminal-Middle and Middle-C-terminal interdomains, and regulation of ATPase–chaperone activity. Interestingly, we find that fitness correlates well across diverse stressful environments, with the exception of one environment, diamide. Consistent with this result, we find little cost of adaptation; on average only one in seven beneficial mutations is deleterious in another environment. We identify a hotspot of beneficial mutations in a region of the protein that is located within an allosteric center. The identified protein regions that are enriched in beneficial, deleterious, and costly mutations coincide with residues that are involved in the stabilization of Hsp90 interdomains and stabilization of client-binding interfaces, or residues that are involved in ATPase–chaperone activity of Hsp90. Thus, our study yields information regarding the role and adaptive potential of a protein sequence that complements and extends known structural information.

Download Full-text

Toward More General Embeddings for Protein Design: Harnessing Joint Representations of Sequence and Structure

10.1101/2021.09.01.458592 ◽

2021 ◽

Author(s):

Sanaa Mansoor ◽

Minkyung Baek ◽

Umesh Madan ◽

Eric Horvitz

Keyword(s):

Protein Design ◽

Protein Sequence ◽

Structural Information ◽

Neural Models ◽

Efficient Approach ◽

Supervised Training ◽

Joint Training ◽

Types Of Information

Protein embeddings learned from aligned sequences have been leveraged in a wide array of tasks in protein understanding and engineering. The sequence embeddings are generated through semi-supervised training on millions of sequences with deep neural models defined with hundreds of millions of parameters, and they continue to increase in performance on target tasks with increasing complexity. We report a more data-efficient approach to encode protein information through joint training on protein sequence and structure in a semi-supervised manner. We show that the method is able to encode both types of information to form a rich embedding space which can be used for downstream prediction tasks. We show that the incorporation of rich structural information into the context under consideration boosts the performance of the model by predicting the effects of single-mutations. We attribute increases in accuracy to the value of leveraging proximity within the enriched representation to identify sequentially and spatially close residues that would be affected by the mutation, using experimentally validated or predicted structures.

Download Full-text

Evaluations of protein sequence alignments using structural information

International Conference on Information Technology: Coding and Computing (ITCC'05) - Volume II ◽

10.1109/itcc.2005.148 ◽

2005 ◽

Cited By ~ 1

Author(s):

A. Auyeung ◽

U. Melcher

Keyword(s):

Protein Sequence ◽

Structural Information ◽

Sequence Alignments

Download Full-text

A Hidden Markov Model Approach to Model Protein Sequence and Structural Information: Identification of Helix-Turn-Helix DNA-Binding Motif

2006 IEEE International Conference on Granular Computing ◽

10.1109/grc.2006.1635821 ◽

2006 ◽

Author(s):

Changhui Yan

Keyword(s):

Dna Binding ◽

Markov Model ◽

Hidden Markov Model ◽

Protein Sequence ◽

Structural Information ◽

Hidden Markov ◽

Binding Motif ◽

Model Protein ◽

Model Approach

Download Full-text

PROMALS3D: Multiple Protein Sequence Alignment Enhanced with Evolutionary and Three-Dimensional Structural Information

Methods in Molecular Biology - Multiple Sequence Alignment Methods ◽

10.1007/978-1-62703-646-7_17 ◽

2013 ◽

pp. 263-271 ◽

Cited By ~ 132

Author(s):

Jimin Pei ◽

Nick V. Grishin

Keyword(s):

Sequence Alignment ◽

Protein Sequence ◽

Structural Information ◽

Three Dimensional ◽

Protein Sequence Alignment ◽

Multiple Protein ◽

Multiple Protein Sequence Alignment

Download Full-text

Novel Descriptors and Digital Signal Processing- Based Method for Protein Sequence Activity Relationship Study

International Journal of Molecular Sciences ◽

10.3390/ijms20225640 ◽

2019 ◽

Vol 20 (22) ◽

pp. 5640 ◽

Cited By ~ 1

Author(s):

Fontaine ◽

Cadet ◽

Vetrivel

Keyword(s):

Protein Sequence ◽

Structural Information ◽

Value Added ◽

Digital Signal ◽

Numerical Sequence ◽

Fast Fourier Transformation ◽

Amino Acid Residues ◽

Fitness Value ◽

Validation Set ◽

And Function

The work aiming to unravel the correlation between protein sequence and function in the absence of structural information can be highly rewarding. We present a new way of considering descriptors from the amino acids index database for modeling and predicting the fitness value of a polypeptide chain. This approach includes the following steps: (i) Calculating Q elementary numerical sequences (Ele_SEQ) depending on the encoding of the amino acid residues, (ii) determining an extended numerical sequence (Ext_SEQ) by concatenating the Q elementary numerical sequences, wherein at least one elementary numerical sequence is a protein spectrum obtained by applying fast Fourier transformation (FFT), and (iii) predicting a value of fitness for polypeptide variants (train and/or validation set). These new descriptors were tested on four sets of proteins of different lengths (GLP-2, TNF alpha, cytochrome P450, and epoxide hydrolase) and activities (cAMP activation, binding affinity, thermostability and enantioselectivity). We show that the use of multiple physicochemical descriptors coupled with the implementation of the FFT, taking into account the interactions between residues of amino amides within the protein sequence, could lead to very significant improvement in the quality of models and predictions. The choice of the descriptor or of the combination of descriptors and/or FFT is dependent on the couple protein/fitness. This approach can provide potential users with value added to existing mutant libraries where screening efforts have so far been unsuccessful in finding improved polypeptide mutants for useful applications.

Download Full-text

To Improve Protein Sequence Profile Prediction through Image Captioning on Pairwise Residue Distance Map

10.1101/628917 ◽

2019 ◽

Author(s):

Sheng Chen ◽

Zhe Sun ◽

Zifeng Liu ◽

Xun Liu ◽

Yutian Chong ◽

...

Keyword(s):

Protein Sequence ◽

Network Architecture ◽

Structural Information ◽

3D Structure ◽

Previous Method ◽

Image Captioning ◽

Sequence Profile ◽

Distance Map ◽

3D Structures ◽

Protein Sequence Profile

ABSTRACTProtein sequence profile prediction aims to generate multiple sequences from structural information to advance the protein design. Protein sequence profile can be computationally predicted by energy-based method or fragment-based methods. By integrating these methods with neural networks, our previous method, SPIN2 has achieved a sequence recovery rate of 34%. However, SPIN2 employed only one dimensional (1D) structural properties that are not sufficient to represent 3D structures. In this study, we represented 3D structures by 2D maps of pairwise residue distances. and developed a new method (SPROF) to predict protein sequence profile based on an image captioning learning frame. To our best knowledge, this is the first method to employ 2D distance map for predicting protein properties. SPROF achieved 39.8% in sequence recovery of residues on the independent test set, representing a 5.2% improvement over SPIN2. We also found the sequence recovery increased with the number of their neighbored residues in 3D structural space, indicating that our method can effectively learn long range information from the 2D distance map. Thus, such network architecture using 2D distance map is expected to be useful for other 3D structure-based applications, such as binding site prediction, protein function prediction, and protein interaction prediction.

Download Full-text