scholarly journals Supervised learning of protein thermal stability using sequence mining and distribution statistics of network centrality

2019 ◽  
Author(s):  
Ankit Sharma ◽  
Ganesh Bagler ◽  
Debajyoti Bera

AbstractMotivationIt is expected that the difference in the thermal stability of mesophilic and thermophilic proteins arises, in part at least, from the differences in their molecular structures and amino acid compositions. Existing machine learning approaches for supervised classification of proteins rely on the features derived from the structural networks and the amino acid sequences. However, the network features used leave out several important network centrality values, the statistic used is a simple average and the sequence features used are hand-picked leading to an accuracy of 90%.ResultsWe show that discriminating sub-sequences of the amino acid sequences can significantly improve classification accuracy compared to the existing approaches of counting amino acids, di-peptide or even tri-peptide bonds. We identify notions of network centrality, specifically that depends on the distances between Cα atoms, that appears to correlate better with thermal stability compared to the existing network features. We also show how to generate better statistics from the node- and edge-wise centrality values that more accurately captures the variations in their values for different types of proteins. These improved feature selection techniques make it possible to classify between thermophilic and mesophilic proteins with 96% accuracy and 99% area under ROC.AvailabilityThe dataset and source code used are available at https://github.com/ankits0207/[email protected].

2012 ◽  
Vol 279 (1740) ◽  
pp. 3075-3082 ◽  
Author(s):  
Evgeny V. Leushkin ◽  
Georgii A. Bazykin ◽  
Alexey S. Kondrashov

Maps that relate all possible genotypes or phenotypes to fitness—fitness landscapes—are central to the evolution of life, but remain poorly known. An insertion or a deletion (indel) of one or several amino acids constitutes a substantial leap of a protein within the space of amino acid sequences, and it is unlikely that after such a leap the new sequence corresponds precisely to a fitness peak. Thus, one can expect an indel in the protein-coding sequence that gets fixed in a population to be followed by some number of adaptive amino acid substitutions, which move the new sequence towards a nearby fitness peak. Here, we study substitutions that occur after a frame-preserving indel in evolving proteins of Drosophila . An insertion triggers 1.03 ± 0.75 amino acid substitutions within the protein region centred at the site of insertion, and a deletion triggers 4.77 ± 1.03 substitutions within such a region. The difference between these values is probably owing to a higher fraction of effectively neutral insertions. Almost all of the triggered amino acid substitutions can be attributed to positive selection, and most of them occur relatively soon after the triggering indel and take place upstream of its site. A high fraction of substitutions that follow an indel occur at previously conserved sites, suggesting that an indel substantially changes selection that shapes the protein region around it. Thus, an indel is often followed by an adaptive walk of length that is in agreement with the theory of molecular adaptation.


2011 ◽  
Vol 197-198 ◽  
pp. 606-609 ◽  
Author(s):  
Ti Feng Jiao ◽  
Yuan Yuan Xing ◽  
Jing Xin Zhou ◽  
Wei Wang

Some functional luminol derivatives with aromatic substituted groups have been designed and synthesized from the reaction of the corresponding aromatic acyl chloride precursors with luminol. It has been found that depending on the size of aromatic groups, the formed luminol derivatives showed different properties, indicating distinct regulation of molecular skeletons. UV and IR data confirmed commonly the formation of imide group as well as aromatic segment in molecular structures. Thermal analysis showed that the thermal stability of luminol derivatives with p-phthaloyl segment was the highest in those derivatives. The difference of thermal stability is mainly attributed to the formation of imide group and aromatic substituent groups in molecular structure. The present results have demonstrated that the special properties of luminol derivatives can be turned by modifying molecular structures of objective compounds with proper substituted groups, which show potential application in functional material field and ECL sensor.


1994 ◽  
Vol 301 (2) ◽  
pp. 545-550 ◽  
Author(s):  
H Nakagawa ◽  
N Komorita ◽  
F Shibata ◽  
A Ikesue ◽  
K Konishi ◽  
...  

Four basic neutrophil chemotactic factors (chemokines) have been purified from conditioned medium of granulation tissue obtained from carrageenin-induced inflammation in the rat. On the basis of their N-terminal amino acid sequences, one of the chemokines was identical with rat GRO/cytokine-induced neutrophil chemoattractant (CINC) which we reported previously, and another was identical with rat macrophage inflammatory protein-2 (MIP-2). Two other chemokines were novel chemoattractants related to MIP-2. The novel chemokines are referred to as rat GRO/CINC-2 alpha and CINC-2 beta, and consequently CINC and rat MIP-2 are renamed rat GRO/CINC-1 and CINC-3 respectively. The complete amino acid sequences of purified CINC-2 alpha and CINC-3 were determined by analysis of the fragments isolated from proteinase V8-treated CINCs. The cDNA for CINC-2 beta was cloned by reverse transcription/PCR amplification using specific primers starting with total RNA extracted from lipopolysaccharide-stimulated rat macrophages. A comparison of the amino acid sequence encoded by the cDNA with the N-terminal amino acid sequence of purified CINC-2 beta revealed that mature CINC-2 beta is a 68-residue chemoattractant produced by cleavage of a 32-residue signal peptide. The difference in amino acid sequences between CINC-2 alpha and CINC-2 beta consisted of only three C-terminal residues. Rat GRO/CINC-2 alpha is a major chemokine, and the four purified chemokines have similar chemotactic activity, suggesting that they contribute to neutrophil infiltration into inflammatory sites in rats.


2011 ◽  
Vol 197-198 ◽  
pp. 623-626 ◽  
Author(s):  
Ti Feng Jiao ◽  
Xu Hui Li ◽  
Jing Xin Zhou ◽  
Jing Ya Liang ◽  
Jing Ren

Some functional Schiff base derivatives with azobenzene substituted groups have been designed and synthesized from the reaction of aminoazobenzene with different aromatic aldehydes. It has been found that depending on the size of aromatic groups, the formed Schiff base derivatives showed different properties, indicating distinct regulation of molecular skeletons. UV and IR data confirmed commonly the formation of Schiff base as well as aromatic segment in molecular structures. Thermal analysis showed that the thermal stability of Schiff base molecules with naphthalene segment increased slightly in comparison with other derivatives. The difference of thermal stability is mainly attributed to the formation of Schiff base group and aromatic substituent groups in molecular structure. The present results have demonstrated that the special properties of Schiff base derivatives can be turned by modifying molecular structures of objective compounds with proper substituted groups, which show potential application in functional material field.


2011 ◽  
Vol 197-198 ◽  
pp. 598-601 ◽  
Author(s):  
Ti Feng Jiao ◽  
Juan Zhou ◽  
Jing Xin Zhou ◽  
Qiong Wang ◽  
Xu Zhong Luo

Some novel trigonal Schiff base compounds with aromatic core and different substituted groups have been designed and synthesized from the reaction of trigonal aromatic amine with different aldehydes. It has been found that depending on the molecular structures and substituted groups, the formed trigonal Schiff base compounds showed different properties, indicating distinct regulation of molecular design. UV and IR data confirmed commonly the formation of Schiff base as well as aromatic segment in molecular structures. Thermal analysis also clarified the structural influence of these compounds in different temperature ranges. The difference of thermal stability is mainly attributed to molecular structures, formation of Schiff base group and different substituted groups. The present results have showed that the special properties of Schiff base compounds could be turned by modifying molecular structures and substituted groups, which show potential application in fields of functional material and catalyst.


2000 ◽  
Vol 381 (12) ◽  
pp. 1195-1202 ◽  
Author(s):  
Zheng Zhu ◽  
Song Ling ◽  
Qi-Heng Yang ◽  
Lin Li

Abstract The fructose-2,6-bisphosphatase domain of the bifunctional chicken liver enzyme 6-phosphofructo-2-kinase/fructose-2,6-bisphosphatase shares approximately 95% amino acid sequence homology with that of the rat enzyme. However, these two enzymes are significantly different in their phosphatase activities. In this report, we show that the COOH-terminal 25 amino acids of the two enzymes are responsible for the different enzymatic activities. Although these 25 amino acids are not required for the phosphatase activity, their removal diminishes the differences in the activities between the two enzymes. In addition, two chimeric molecules (one consisting of the catalytic core of the chicken bisphosphatase domain and the rat COOH-terminal 25 amino acids, and the other consisting of most of the intact chicken enzyme and the rat COOH-terminal 25 amino acids) showed the same kinetic properties as the rat enzyme. Furthermore, substitution of the residues Pro456pro457Ala458 of the chicken enzyme with GluAlaGlu, the corresponding sequence in the rat liver enzyme, yields a chicken enzyme that behaves like the rat enzyme. These results demonstrate that the different bisphosphatase activities of the chicken and rat liver bifunctional enzymes can be attributed to the differences in their COOH-terminal amino acid sequences, particularly the three residues.


Acta Naturae ◽  
2014 ◽  
Vol 6 (3) ◽  
pp. 76-88 ◽  
Author(s):  
I. V. Golubev ◽  
N. V. Komarova ◽  
K. V. Ryzhenkova ◽  
T. A. Chubar ◽  
S. S. Savin ◽  
...  

Hydrophobization of alpha-helices is one of the general approaches used for improving the thermal stability of enzymes. A total of 11 serine residues located in alpha-helices have been found based on multiple alignments of the amino acid sequences of D-amino acid oxidases from different organisms and the analysis of the 3D-structure of D-amino acid oxidase from yeast Trigonopsis variabilis (TvDAAO, EC 1.4.3.3). As a result of further structural analysis, eight Ser residues in 67, 77, 78, 105, 270, 277, 335, and 336 positions have been selected to be substituted with Ala. S78A and S270A substitutions have resulted in dramatic destabilization of the enzyme. Mutant enzymes were inactivated during isolation from cells. Another six mutant TvDAAOs have been highly purified and their properties have been characterized. The amino acid substitutions S277A and S336A destabilized the protein globule. The thermal stabilities of TvDAAO S77A and TvDAAO S335A mutants were close to that of the wild-type enzyme, while S67A and S105A substitutions resulted in approximately 1.5- and 2.0-fold increases in the TvDAAO mutant thermal stability, respectively. Furthermore, the TvDAAO S105A mutant showed on average a 1.2- to 3.0-fold higher catalytic efficiency with D-Asn, D-Tyr, D-Phe, and D-Leu as compared to the wild-type enzyme.


2020 ◽  
Vol 14 (suppl 1) ◽  
pp. 757-763 ◽  
Author(s):  
Shantani Kannan ◽  
Kannan Subbaram ◽  
Sheeza Ali ◽  
Hemalatha Kannan

Coronavirus disease – 2019 (COVID-19) pandemic, due to severe acute respiratory syndrome–coronavirus-2 (SARS-CoV-2), is posing a severe bio threat to the entire world. Nucleocapsids of SARS-CoV-2 and the related viruses were studied for gene and amino acid sequence homologies. In this study, we established similarities and differences in nucleocapsids in SARS-CoV-2, severe acute respiratory syndrome – coronavirus-1 (SARS-CoV-1), bat coronavirus (bat-CoV) and Middle East respiratory syndrome – coronavirus (MERS-CoV). We conducted a detailed analysis of the nucleocapsid protein amino acid and gene sequence encoding it, found in various coronavirus strains. After thoroughly screening the different nucleocapsids, we observed a close molecular homology between SARS-CoV-1 and SARS-CoV-2. More than 95% sequence similarity was observed between the two SARS-CoV strains. Bat-CoV and SARS-CoV-2 showed 92% sequence similarity. MERS-CoV and SARS-CoV-2 nucleocapsid analysis indicated only 65% identity. Molecular characterization of nucleocapsids from various coronaviruses revealed that SARS-CoV 2 is more related to SARS-CoV 1 and bat-CoV. SARS-CoV 2 exhibited less resemblance with MERS-CoV. SARS-CoV 2 showed less similarity to MERS-CoV. Thus, either SARS-CoV-1 or bat-CoV may be the source of SARS-CoV-2 evolution. Moreover, the existing differences in nucleocapsid molecular structures in SARS-CoV-2 make this virus more virulent and highly infectious, which means that the non-identical SARS-CoV-2 genes (which are absent in SARS-CoV-1 and bat-CoV) are responsible for COVID-19 severity. We observed that SARS-CoV-2 nucleocapsid from different locations varied in amino acid sequences. This revealed that there are many SARS-CoV-2 subtypes/subsets currently circulating globally. This study will help to develop antiviral vaccine and drugs, study viral replication and immunopathogenesis, and synthesize monoclonal antibodies that can be used for precise COVID-19 diagnosis, without false-positive/false-negative results.


PLoS ONE ◽  
2021 ◽  
Vol 16 (10) ◽  
pp. e0258821
Author(s):  
Satoshi Akanuma ◽  
Minako Yamaguchi ◽  
Akihiko Yamagishi

Further improvement of the thermostability of inherently thermostable proteins is an attractive challenge because more thermostable proteins are industrially more useful and serve as better scaffolds for protein engineering. To establish guidelines that can be applied for the rational design of hyperthermostable proteins, we compared the amino acid sequences of two ancestral nucleoside diphosphate kinases, Arc1 and Bac1, reconstructed in our previous study. Although Bac1 is a thermostable protein whose unfolding temperature is around 100°C, Arc1 is much more thermostable with an unfolding temperature of 114°C. However, only 12 out of 139 amino acids are different between the two sequences. In this study, one or a combination of amino acid(s) in Bac1 was/were substituted by a residue(s) found in Arc1 at the same position(s). The best mutant, which contained three amino acid substitutions (S108D, G116A and L120P substitutions), showed an unfolding temperature more than 10°C higher than that of Bac1. Furthermore, a combination of the other nine amino acid substitutions also led to improved thermostability of Bac1, although the effects of individual substitutions were small. Therefore, not only the sum of the contributions of individual amino acids, but also the synergistic effects of multiple amino acids are deeply involved in the stability of a hyperthermostable protein. Such insights will be helpful for future rational design of hyperthermostable proteins.


Protein molecules are essential catalysts in life processes and also form much of the substance of living material. Their three dimensional structures determine their biological function. Their biosynthesis is primarily determined by arrays of nucleic acid macromolecules (DNA and RNA), and the amino acid sequences that constitute their long spatially organized peptide-chain molecules reflect at one remove this DNA coding system, and thus record a step-by-step history of some of the viable genetic events (natural or man-controlled) that have created the organism and the breed. Amino acid sequences can be used to trace the progress of controlled breeding in two ways: by extrapolation back from living breeds, and by analysis of ancient protein material. O f the latter, bone or tendon or skin collagens and hair keratins are the most perfectly preserved as molecular structures through 20000 years and indeed much longer. Amino acid sequences are expensive to determine (collagen has 1052 amino acid residues), and the potential of this palaeobiological information has been as yet little exploited. The first approach has, however, been more explored, in both plants and animals. Several protein systems must be studied in conjunction to reveal the phylogenetic threads in any one breed. As the three dimensional quaternary structure of protein molecules becomes more appreciated in relation to biological function, and as new techniques and procedures are developed, amino acid sequence data can become more informative in our ultimate understanding of early selective breeding.


Sign in / Sign up

Export Citation Format

Share Document