scholarly journals Novel Descriptors and Digital Signal Processing- Based Method for Protein Sequence Activity Relationship Study

2019 ◽  
Vol 20 (22) ◽  
pp. 5640 ◽  
Author(s):  
Fontaine ◽  
Cadet ◽  
Vetrivel

The work aiming to unravel the correlation between protein sequence and function in the absence of structural information can be highly rewarding. We present a new way of considering descriptors from the amino acids index database for modeling and predicting the fitness value of a polypeptide chain. This approach includes the following steps: (i) Calculating Q elementary numerical sequences (Ele_SEQ) depending on the encoding of the amino acid residues, (ii) determining an extended numerical sequence (Ext_SEQ) by concatenating the Q elementary numerical sequences, wherein at least one elementary numerical sequence is a protein spectrum obtained by applying fast Fourier transformation (FFT), and (iii) predicting a value of fitness for polypeptide variants (train and/or validation set). These new descriptors were tested on four sets of proteins of different lengths (GLP-2, TNF alpha, cytochrome P450, and epoxide hydrolase) and activities (cAMP activation, binding affinity, thermostability and enantioselectivity). We show that the use of multiple physicochemical descriptors coupled with the implementation of the FFT, taking into account the interactions between residues of amino amides within the protein sequence, could lead to very significant improvement in the quality of models and predictions. The choice of the descriptor or of the combination of descriptors and/or FFT is dependent on the couple protein/fitness. This approach can provide potential users with value added to existing mutant libraries where screening efforts have so far been unsuccessful in finding improved polypeptide mutants for useful applications.

1997 ◽  
Vol 75 (6) ◽  
pp. 687-696 ◽  
Author(s):  
Tamo Fukamizo ◽  
Ryszard Brzezinski

Novel information on the structure and function of chitosanase, which hydrolyzes the beta -1,4-glycosidic linkage of chitosan, has accumulated in recent years. The cloning of the chitosanase gene from Streptomyces sp. strain N174 and the establishment of an efficient expression system using Streptomyces lividans TK24 have contributed to these advances. Amino acid sequence comparisons of the chitosanases that have been sequenced to date revealed a significant homology in the N-terminal module. From energy minimization based on the X-ray crystal structure of Streptomyces sp. strain N174 chitosanase, the substrate binding cleft of this enzyme was estimated to be composed of six monosaccharide binding subsites. The hydrolytic reaction takes place at the center of the binding cleft with an inverting mechanism. Site-directed mutagenesis of the carboxylic amino acid residues that are conserved revealed that Glu-22 and Asp-40 are the catalytic residues. The tryptophan residues in the chitosanase do not participate directly in the substrate binding but stabilize the protein structure by interacting with hydrophobic and carboxylic side chains of the other amino acid residues. Structural and functional similarities were found between chitosanase, barley chitinase, bacteriophage T4 lysozyme, and goose egg white lysozyme, even though these proteins share no sequence similarities. This information can be helpful for the design of new chitinolytic enzymes that can be applied to carbohydrate engineering, biological control of phytopathogens, and other fields including chitinous polysaccharide degradation. Key words: chitosanase, amino acid sequence, overexpression system, reaction mechanism, site-directed mutagenesis.


2005 ◽  
Vol 11 (5) ◽  
pp. 535-546 ◽  
Author(s):  
Anna Kondakov ◽  
Buko Lindner

Bacterial glycolipids are complex amphiphilic molecules which are, on the one hand, of utmost importance for the organization and function of bacterial membranes and which, on the other hand, play a major role in the activation of cells of the innate and adaptive immune system of the host. Already small alterations to their chemical structure may influence the biological activity tremendously. Due to their intrinsic biological heterogeneity [number and type of fatty acids, saccharide structures and substitution with for example, phosphate ( P), 2-aminoethyl-(pyro)phosphate groups ( P-Etn) or 4-amino-4-deoxyarabinose (Ara4N)], separation of the different components are a prerequisite for unequivocal chemical and nuclear magnetic resonance structural analyses. In this contribution, the structural information which can be obtained from heterogenous samples of glycolipids by Fourier transform (FT) ion cyclotron resonance mass spectrometric methods is described. By means of recently analysed complex biological samples, the possibilities of high-resolution electrospray ionization FT-MS are demonstrated. Capillary skimmer dissociation, as well as tandem mass spectrometry (MS/MS) analysis utilizing collision-induced dissociation and infrared multiphoton dissociation, are compared and their advantages in providing structural information of diagnostic importance are discussed.


2007 ◽  
Vol 2007 ◽  
pp. 1-23 ◽  
Author(s):  
G. R. Hemalatha ◽  
D. Satyanarayana Rao ◽  
L. Guruprasad

We have identified four repeats and ten domains that are novel in proteins encoded by theBacillus anthracisstr.Amesproteome using automated in silico methods. A “repeat” corresponds to a region comprising less than 55-amino-acid residues that occur more than once in the protein sequence and sometimes present in tandem. A “domain” corresponds to a conserved region with greater than 55-amino-acid residues and may be present as single or multiple copies in the protein sequence. These correspond to (1) 57-amino-acid-residue PxV domain, (2) 122-amino-acid-residue FxF domain, (3) 111-amino-acid-residue YEFF domain, (4) 109-amino-acid-residue IMxxH domain, (5) 103-amino-acid-residue VxxT domain, (6) 84-amino-acid-residue ExW domain, (7) 104-amino-acid-residue NTGFIG domain, (8) 36-amino-acid-residue NxGK repeat, (9) 95-amino-acid-residue VYV domain, (10) 75-amino-acid-residue KEWE domain, (11) 59-amino-acid-residue AFL domain, (12) 53-amino-acid-residue RIDVK repeat, (13) (a) 41-amino-acid-residue AGQF repeat and (b) 42-amino-acid-residue GSAL repeat. A repeat or domain type is characterized by specific conserved sequence motifs. We discuss the presence of these repeats and domains in proteins from other genomes and their probable secondary structure.


IEEE Access ◽  
2021 ◽  
pp. 1-1
Author(s):  
Seonwoo Min ◽  
Seunghyun Park ◽  
Siwon Kim ◽  
Hyun-Soo Choi ◽  
Byunghan Lee ◽  
...  

2020 ◽  
Author(s):  
Sam Gelman ◽  
Philip A. Romero ◽  
Anthony Gitter

ABSTRACTThe mapping from protein sequence to function is highly complex, making it challenging to predict how sequence changes will affect a protein’s behavior and properties. We present a supervised deep learning framework to learn the sequence-function mapping from deep mutational scanning data and make predictions for new, uncharacterized sequence variants. We test multiple neural network architectures, including a graph convolutional network that incorporates protein structure, to explore how a network’s internal representation affects its ability to learn the sequence-function mapping. Our supervised learning approach displays superior performance over physics-based and unsupervised prediction methods. We find networks that capture nonlinear interactions and share parameters across sequence positions are important for learning the relationship between sequence and function. Further analysis of the trained models reveals the networks’ ability to learn biologically meaningful information about protein structure and mechanism. Our software is available from https://github.com/gitter-lab/nn4dms.


2020 ◽  
Author(s):  
Junwen Luo ◽  
Yi Cai ◽  
Jialin Wu ◽  
Hongmin Cai ◽  
Xiaofeng Yang ◽  
...  

AbstractIn recent years, deep learning has been increasingly used to decipher the relationships among protein sequence, structure, and function. Thus far deep learning of proteins has mostly utilized protein primary sequence information, while the vast amount of protein tertiary structural information remains unused. In this study, we devised a self-supervised representation learning framework to extract the fundamental features of unlabeled protein tertiary structures (PtsRep), and the embedded representations were transferred to two commonly recognized protein engineering tasks, protein stability and GFP fluorescence prediction. On both tasks, PtsRep significantly outperformed the two benchmark methods (UniRep and TAPE-BERT), which are based on protein primary sequences. Protein clustering analyses demonstrated that PtsRep can capture the structural signals in proteins. PtsRep reveals an avenue for general protein structural representation learning, and for exploring protein structural space for protein engineering and drug design.


2020 ◽  
Author(s):  
Bo Wei ◽  
Patrick Willems ◽  
Jingjing Huang ◽  
Caiping Tian ◽  
Jing Yang ◽  
...  

ABSTRACTIn proteins, hydrogen peroxide (H2O2) reacts with redox-sensitive cysteines to form cysteine sulfenic acid, also known as S-sulfenylation. These cysteine oxidation events can steer diverse cellular processes by altering protein interactions, trafficking, conformation, and function. Previously, we had identified S-sulfenylated proteins by using a tagged proteinaceous probe based on the yeast AP-1–like (Yap1) transcription factor that specifically reacts with sulfenic acids and traps them through a mixed disulfide bond. However, the identity of the S-sulfenylated amino acid residues remained enigmatic. Here, we present a technological advancement to identify in situ sulfenylated cysteines directly by means of the transgenic Yap1 probe. In Arabidopsis thaliana cells, after an initial affinity purification and a tryptic digestion, we further enriched the mixed disulfide-linked peptides with an antibody targeting the YAP1C-derived peptide (C598SEIWDR) that entails the redox-active cysteine. Subsequent mass spectrometry analysis with pLink 2 identified 1,745 YAP1C cross-linked peptides, indicating sulfenylated cysteines in over 1,000 proteins. Approximately 55% of these YAP1C-linked cysteines had previously been reported as redox-sensitive cysteines (S-sulfenylation, S-nitrosylation, and reversibly oxidized cysteines). The presented methodology provides a noninvasive approach to identify sulfenylated cysteines in any species that can be genetically modified.


1987 ◽  
Vol 42 (11-12) ◽  
pp. 1231-1238 ◽  
Author(s):  
Richard J. Berzborn ◽  
Werner Finke ◽  
Joachim Otto ◽  
Helmut E . Meyer

Chloroplast ATP-synthase (CF1) subunit delta (δ) has been isolated from spinach thylakoids in the presence of SDS. By automated Edman degradation and online analysis of PTH derivatives the 35 N-terminal amino acid residues were sequenced. The mature protein starts with: NH2-Val-Asp-Ser-Thr-Ala-Ser-Arg-Tyr-Ala-. This protein sequence allows alignment of spinach δ with the sequences of Z. mays 25 kDa polypeptide, the δ subunit of Rps. blastica, Rsp. rubrum and E. coli F1, and of bovine OSCP, but not with mitochondrial δ. Secondary structure calculations and helical wheel plots reveal a conserved secondary structure. The analyzed N-terminal sequences probably build a short amphipathic alpha helix with two adjacent turns. The such aligned polar residues around Tyr8 of subunit δ are suitable to channel protons.


Sign in / Sign up

Export Citation Format

Share Document