Novel Descriptors and Digital Signal Processing- Based Method for Protein Sequence Activity Relationship Study

Fontaine;  Cadet;  Vetrivel

doi:10.3390/ijms20225640

Novel Descriptors and Digital Signal Processing- Based Method for Protein Sequence Activity Relationship Study

International Journal of Molecular Sciences ◽

10.3390/ijms20225640 ◽

2019 ◽

Vol 20 (22) ◽

pp. 5640 ◽

Cited By ~ 1

Author(s):

Fontaine ◽

Cadet ◽

Vetrivel

Keyword(s):

Protein Sequence ◽

Structural Information ◽

Value Added ◽

Digital Signal ◽

Numerical Sequence ◽

Fast Fourier Transformation ◽

Amino Acid Residues ◽

Fitness Value ◽

Validation Set ◽

And Function

The work aiming to unravel the correlation between protein sequence and function in the absence of structural information can be highly rewarding. We present a new way of considering descriptors from the amino acids index database for modeling and predicting the fitness value of a polypeptide chain. This approach includes the following steps: (i) Calculating Q elementary numerical sequences (Ele_SEQ) depending on the encoding of the amino acid residues, (ii) determining an extended numerical sequence (Ext_SEQ) by concatenating the Q elementary numerical sequences, wherein at least one elementary numerical sequence is a protein spectrum obtained by applying fast Fourier transformation (FFT), and (iii) predicting a value of fitness for polypeptide variants (train and/or validation set). These new descriptors were tested on four sets of proteins of different lengths (GLP-2, TNF alpha, cytochrome P450, and epoxide hydrolase) and activities (cAMP activation, binding affinity, thermostability and enantioselectivity). We show that the use of multiple physicochemical descriptors coupled with the implementation of the FFT, taking into account the interactions between residues of amino amides within the protein sequence, could lead to very significant improvement in the quality of models and predictions. The choice of the descriptor or of the combination of descriptors and/or FFT is dependent on the couple protein/fitness. This approach can provide potential users with value added to existing mutant libraries where screening efforts have so far been unsuccessful in finding improved polypeptide mutants for useful applications.

Download Full-text

Chitosanase from Streptomyces sp. strain N174: a comparative review of its structure and function

Biochemistry and Cell Biology ◽

10.1139/o97-079 ◽

1997 ◽

Vol 75 (6) ◽

pp. 687-696 ◽

Cited By ~ 20

Author(s):

Tamo Fukamizo ◽

Ryszard Brzezinski

Keyword(s):

Amino Acid ◽

Amino Acid Sequence ◽

Substrate Binding ◽

Structure And Function ◽

Site Directed Mutagenesis ◽

Directed Mutagenesis ◽

Amino Acid Residues ◽

Streptomyces Sp ◽

Binding Cleft ◽

And Function

Novel information on the structure and function of chitosanase, which hydrolyzes the beta -1,4-glycosidic linkage of chitosan, has accumulated in recent years. The cloning of the chitosanase gene from Streptomyces sp. strain N174 and the establishment of an efficient expression system using Streptomyces lividans TK24 have contributed to these advances. Amino acid sequence comparisons of the chitosanases that have been sequenced to date revealed a significant homology in the N-terminal module. From energy minimization based on the X-ray crystal structure of Streptomyces sp. strain N174 chitosanase, the substrate binding cleft of this enzyme was estimated to be composed of six monosaccharide binding subsites. The hydrolytic reaction takes place at the center of the binding cleft with an inverting mechanism. Site-directed mutagenesis of the carboxylic amino acid residues that are conserved revealed that Glu-22 and Asp-40 are the catalytic residues. The tryptophan residues in the chitosanase do not participate directly in the substrate binding but stabilize the protein structure by interacting with hydrophobic and carboxylic side chains of the other amino acid residues. Structural and functional similarities were found between chitosanase, barley chitinase, bacteriophage T4 lysozyme, and goose egg white lysozyme, even though these proteins share no sequence similarities. This information can be helpful for the design of new chitinolytic enzymes that can be applied to carbohydrate engineering, biological control of phytopathogens, and other fields including chitinous polysaccharide degradation. Key words: chitosanase, amino acid sequence, overexpression system, reaction mechanism, site-directed mutagenesis.

Download Full-text

Structural Characterization of Complex Bacterial Glycolipids by Fourier Transform Mass Spectrometry

European Journal of Mass Spectrometry ◽

10.1255/ejms.721 ◽

2005 ◽

Vol 11 (5) ◽

pp. 535-546 ◽

Cited By ~ 39

Author(s):

Anna Kondakov ◽

Buko Lindner

Keyword(s):

Mass Spectrometry ◽

Fourier Transform ◽

Structural Information ◽

Adaptive Immune System ◽

Ion Cyclotron Resonance ◽

Bacterial Membranes ◽

Multiphoton Dissociation ◽

The One ◽

And Function

Bacterial glycolipids are complex amphiphilic molecules which are, on the one hand, of utmost importance for the organization and function of bacterial membranes and which, on the other hand, play a major role in the activation of cells of the innate and adaptive immune system of the host. Already small alterations to their chemical structure may influence the biological activity tremendously. Due to their intrinsic biological heterogeneity [number and type of fatty acids, saccharide structures and substitution with for example, phosphate ( P), 2-aminoethyl-(pyro)phosphate groups ( P-Etn) or 4-amino-4-deoxyarabinose (Ara4N)], separation of the different components are a prerequisite for unequivocal chemical and nuclear magnetic resonance structural analyses. In this contribution, the structural information which can be obtained from heterogenous samples of glycolipids by Fourier transform (FT) ion cyclotron resonance mass spectrometric methods is described. By means of recently analysed complex biological samples, the possibilities of high-resolution electrospray ionization FT-MS are demonstrated. Capillary skimmer dissociation, as well as tandem mass spectrometry (MS/MS) analysis utilizing collision-induced dissociation and infrared multiphoton dissociation, are compared and their advantages in providing structural information of diagnostic importance are discussed.

Download Full-text

Identification and Analysis of Novel Amino-Acid Sequence Repeats inBacillus anthracisstr.AmesProteome Using Computational Tools

Comparative and Functional Genomics ◽

10.1155/2007/47161 ◽

2007 ◽

Vol 2007 ◽

pp. 1-23 ◽

Cited By ~ 2

Author(s):

G. R. Hemalatha ◽

D. Satyanarayana Rao ◽

L. Guruprasad

Keyword(s):

Amino Acid ◽

Amino Acid Residue ◽

Protein Sequence ◽

Amino Acid Residues ◽

Sequence Motifs ◽

Computational Tools ◽

Conserved Sequence ◽

Conserved Sequence Motifs ◽

Multiple Copies ◽

Domain 3

We have identified four repeats and ten domains that are novel in proteins encoded by theBacillus anthracisstr.Amesproteome using automated in silico methods. A “repeat” corresponds to a region comprising less than 55-amino-acid residues that occur more than once in the protein sequence and sometimes present in tandem. A “domain” corresponds to a conserved region with greater than 55-amino-acid residues and may be present as single or multiple copies in the protein sequence. These correspond to (1) 57-amino-acid-residue PxV domain, (2) 122-amino-acid-residue FxF domain, (3) 111-amino-acid-residue YEFF domain, (4) 109-amino-acid-residue IMxxH domain, (5) 103-amino-acid-residue VxxT domain, (6) 84-amino-acid-residue ExW domain, (7) 104-amino-acid-residue NTGFIG domain, (8) 36-amino-acid-residue NxGK repeat, (9) 95-amino-acid-residue VYV domain, (10) 75-amino-acid-residue KEWE domain, (11) 59-amino-acid-residue AFL domain, (12) 53-amino-acid-residue RIDVK repeat, (13) (a) 41-amino-acid-residue AGQF repeat and (b) 42-amino-acid-residue GSAL repeat. A repeat or domain type is characterized by specific conserved sequence motifs. We discuss the presence of these repeats and domains in proteins from other genomes and their probable secondary structure.

Download Full-text

Pre-training of Deep Bidirectional Protein Sequence Representations with Structural Information

IEEE Access ◽

10.1109/access.2021.3110269 ◽

2021 ◽

pp. 1-1

Author(s):

Seonwoo Min ◽

Seunghyun Park ◽

Siwon Kim ◽

Hyun-Soo Choi ◽

Byunghan Lee ◽

...

Keyword(s):

Protein Sequence ◽

Structural Information

Download Full-text

Neural networks to learn protein sequence-function relationships from deep mutational scanning data

10.1101/2020.10.25.353946 ◽

2020 ◽

Author(s):

Sam Gelman ◽

Philip A. Romero ◽

Anthony Gitter

Keyword(s):

Protein Structure ◽

Protein Sequence ◽

Internal Representation ◽

Superior Performance ◽

Network Architectures ◽

Convolutional Network ◽

Learning Framework ◽

And Function ◽

Multiple Neural Network ◽

Function Mapping

ABSTRACTThe mapping from protein sequence to function is highly complex, making it challenging to predict how sequence changes will affect a protein’s behavior and properties. We present a supervised deep learning framework to learn the sequence-function mapping from deep mutational scanning data and make predictions for new, uncharacterized sequence variants. We test multiple neural network architectures, including a graph convolutional network that incorporates protein structure, to explore how a network’s internal representation affects its ability to learn the sequence-function mapping. Our supervised learning approach displays superior performance over physics-based and unsupervised prediction methods. We find networks that capture nonlinear interactions and share parameters across sequence positions are important for learning the relationship between sequence and function. Further analysis of the trained models reveals the networks’ ability to learn biologically meaningful information about protein structure and mechanism. Our software is available from https://github.com/gitter-lab/nn4dms.

Download Full-text

Self-Supervised Representation Learning of Protein Tertiary Structures (PtsRep): Protein Engineering as A Case Study

10.1101/2020.12.22.423916 ◽

2020 ◽

Author(s):

Junwen Luo ◽

Yi Cai ◽

Jialin Wu ◽

Hongmin Cai ◽

Xiaofeng Yang ◽

...

Keyword(s):

Deep Learning ◽

Protein Engineering ◽

Structural Information ◽

Representation Learning ◽

Sequence Information ◽

Structural Representation ◽

Tertiary Structures ◽

Structural Space ◽

General Protein ◽

And Function

AbstractIn recent years, deep learning has been increasingly used to decipher the relationships among protein sequence, structure, and function. Thus far deep learning of proteins has mostly utilized protein primary sequence information, while the vast amount of protein tertiary structural information remains unused. In this study, we devised a self-supervised representation learning framework to extract the fundamental features of unlabeled protein tertiary structures (PtsRep), and the embedded representations were transferred to two commonly recognized protein engineering tasks, protein stability and GFP fluorescence prediction. On both tasks, PtsRep significantly outperformed the two benchmark methods (UniRep and TAPE-BERT), which are based on protein primary sequences. Protein clustering analyses demonstrated that PtsRep can capture the structural signals in proteins. PtsRep reveals an avenue for general protein structural representation learning, and for exploring protein structural space for protein engineering and drug design.

Download Full-text

Identification of sulfenylated cysteines in Arabidopsis thaliana proteins using a disulfide-linked peptide reporter

10.1101/2020.03.25.989145 ◽

2020 ◽

Author(s):

Bo Wei ◽

Patrick Willems ◽

Jingjing Huang ◽

Caiping Tian ◽

Jing Yang ◽

...

Keyword(s):

Arabidopsis Thaliana ◽

Protein Interactions ◽

Affinity Purification ◽

Mass Spectrometry Analysis ◽

Technological Advancement ◽

Amino Acid Residues ◽

Cysteine Oxidation ◽

Spectrometry Analysis ◽

Mixed Disulfide ◽

And Function

ABSTRACTIn proteins, hydrogen peroxide (H2O2) reacts with redox-sensitive cysteines to form cysteine sulfenic acid, also known as S-sulfenylation. These cysteine oxidation events can steer diverse cellular processes by altering protein interactions, trafficking, conformation, and function. Previously, we had identified S-sulfenylated proteins by using a tagged proteinaceous probe based on the yeast AP-1–like (Yap1) transcription factor that specifically reacts with sulfenic acids and traps them through a mixed disulfide bond. However, the identity of the S-sulfenylated amino acid residues remained enigmatic. Here, we present a technological advancement to identify in situ sulfenylated cysteines directly by means of the transgenic Yap1 probe. In Arabidopsis thaliana cells, after an initial affinity purification and a tryptic digestion, we further enriched the mixed disulfide-linked peptides with an antibody targeting the YAP1C-derived peptide (C598SEIWDR) that entails the redox-active cysteine. Subsequent mass spectrometry analysis with pLink 2 identified 1,745 YAP1C cross-linked peptides, indicating sulfenylated cysteines in over 1,000 proteins. Approximately 55% of these YAP1C-linked cysteines had previously been reported as redox-sensitive cysteines (S-sulfenylation, S-nitrosylation, and reversibly oxidized cysteines). The presented methodology provides a noninvasive approach to identify sulfenylated cysteines in any species that can be genetically modified.

Download Full-text

STING Report: convenient web-based application for graphic and tabular presentations of protein sequence, structure and function descriptors from the STING database

Nucleic Acids Research ◽

10.1093/nar/gki111 ◽

2004 ◽

Vol 33 (Database issue) ◽

pp. D269-D274 ◽

Cited By ~ 10

Author(s):

G. Neshich

Keyword(s):

Protein Sequence ◽

Structure And Function ◽

Sequence Structure ◽

Web Based ◽

And Function

Download Full-text

The Relationship Between Protein Sequence, Structure and Function

Supramolecular Structure and Function 8 ◽

10.1007/0-306-48662-8_2 ◽

2005 ◽

pp. 15-29 ◽

Cited By ~ 2

Author(s):

Anna Tramontano ◽

Domenico Cozzetto

Keyword(s):

Protein Sequence ◽

Structure And Function ◽

Sequence Structure ◽

And Function ◽

The Relationship

Download Full-text

Protein Sequence and Structure of N-terminal Amino Acids of Subunit Delta of Spinach Photosynthetic ATP-Synthase CF1

Zeitschrift für Naturforschung C ◽

10.1515/znc-1987-11-1215 ◽

1987 ◽

Vol 42 (11-12) ◽

pp. 1231-1238 ◽

Cited By ~ 11

Author(s):

Richard J. Berzborn ◽

Werner Finke ◽

Joachim Otto ◽

Helmut E . Meyer

Keyword(s):

Secondary Structure ◽

Atp Synthase ◽

Protein Sequence ◽

Alpha Helix ◽

Amino Acid Residues ◽

E Coli ◽

Helical Wheel ◽

Terminal Amino ◽

Amphipathic Alpha Helix ◽

Structure Calculations

Chloroplast ATP-synthase (CF1) subunit delta (δ) has been isolated from spinach thylakoids in the presence of SDS. By automated Edman degradation and online analysis of PTH derivatives the 35 N-terminal amino acid residues were sequenced. The mature protein starts with: NH2-Val-Asp-Ser-Thr-Ala-Ser-Arg-Tyr-Ala-. This protein sequence allows alignment of spinach δ with the sequences of Z. mays 25 kDa polypeptide, the δ subunit of Rps. blastica, Rsp. rubrum and E. coli F1, and of bovine OSCP, but not with mitochondrial δ. Secondary structure calculations and helical wheel plots reveal a conserved secondary structure. The analyzed N-terminal sequences probably build a short amphipathic alpha helix with two adjacent turns. The such aligned polar residues around Tyr8 of subunit δ are suitable to channel protons.

Download Full-text