Frequencies of amino acid strings in globular protein sequences indicate suppression of blocks of consecutive hydrophobic residues

Russell Schwartz; Sorin Istrail; Jonathan King

doi:10.1110/ps.33201

Amino-acid site variability among natural and designed proteins

10.7287/peerj.preprints.74 ◽

2013 ◽

Author(s):

Eleisha L. Jackson ◽

Noah Ollikainen ◽

Arthur W. Covert III ◽

Tanja Kortemme ◽

Claus O. Wilke

Keyword(s):

Amino Acid ◽

Protein Design ◽

Protein Sequences ◽

Structural Constraints ◽

Scoring Functions ◽

Solvent Exposure ◽

Backbone Flexibility ◽

Hydrophobic Residues ◽

Designed Proteins ◽

Site Variability

Computational protein design attempts to create protein sequences that fold stably into pre-specified structures. Here we compare alignments of designed proteins to alignments of natural proteins and assess how closely designed sequences recapitulate patterns of sequence variation found in natural protein sequences. We design proteins using RosettaDesign, and we evaluate both fixed-backbone designs and variable-backbone designs with different amounts of backbone flexibility. We find that proteins designed with a fixed backbone tend to underestimate the amount of site variability observed in natural proteins while proteins designed with an intermediate amount of backbone flexibility result in more realistic site variability. Further, the correlation between solvent exposure and site variability in designed proteins is lower than that in natural proteins. This finding suggests that site variability is too uniform across different solvent exposure states (i.e., buried residues are too variable or exposed residues too conserved). When comparing the amino acid frequencies in the designed proteins with those in natural proteins we find that in the designed proteins hydrophobic residues are underrepresented in the core. From these results we conclude that intermediate backbone flexibility during design results in more accurate protein design and that either scoring functions or backbone sampling methods require further improvement to accurately replicate structural constraints on site variability.

Download Full-text

Amino-acid site variability among natural and designed proteins

10.7287/peerj.preprints.74v1 ◽

2013 ◽

Author(s):

Eleisha L. Jackson ◽

Noah Ollikainen ◽

Arthur W. Covert III ◽

Tanja Kortemme ◽

Claus O. Wilke

Keyword(s):

Amino Acid ◽

Protein Design ◽

Protein Sequences ◽

Structural Constraints ◽

Scoring Functions ◽

Solvent Exposure ◽

Backbone Flexibility ◽

Hydrophobic Residues ◽

Designed Proteins ◽

Site Variability

Computational protein design attempts to create protein sequences that fold stably into pre-specified structures. Here we compare alignments of designed proteins to alignments of natural proteins and assess how closely designed sequences recapitulate patterns of sequence variation found in natural protein sequences. We design proteins using RosettaDesign, and we evaluate both fixed-backbone designs and variable-backbone designs with different amounts of backbone flexibility. We find that proteins designed with a fixed backbone tend to underestimate the amount of site variability observed in natural proteins while proteins designed with an intermediate amount of backbone flexibility result in more realistic site variability. Further, the correlation between solvent exposure and site variability in designed proteins is lower than that in natural proteins. This finding suggests that site variability is too uniform across different solvent exposure states (i.e., buried residues are too variable or exposed residues too conserved). When comparing the amino acid frequencies in the designed proteins with those in natural proteins we find that in the designed proteins hydrophobic residues are underrepresented in the core. From these results we conclude that intermediate backbone flexibility during design results in more accurate protein design and that either scoring functions or backbone sampling methods require further improvement to accurately replicate structural constraints on site variability.

Download Full-text

Computational Analysis of Therapeutic Enzyme Uricase from Different Source Organisms

Current Proteomics ◽

10.2174/1570164616666190617165107 ◽

2020 ◽

Vol 17 (1) ◽

pp. 59-77

Author(s):

Anand Kumar Nelapati ◽

JagadeeshBabu PonnanEttiyappan

Keyword(s):

Uric Acid ◽

Amino Acid ◽

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Protein Sequences ◽

Amino Acid Sequences ◽

Amino Acid Residues ◽

Multiple Sequence ◽

Physiochemical Properties ◽

Pharmaceutical Industries

Background:Hyperuricemia and gout are the conditions, which is a response of accumulation of uric acid in the blood and urine. Uric acid is the product of purine metabolic pathway in humans. Uricase is a therapeutic enzyme that can enzymatically reduces the concentration of uric acid in serum and urine into more a soluble allantoin. Uricases are widely available in several sources like bacteria, fungi, yeast, plants and animals.Objective:The present study is aimed at elucidating the structure and physiochemical properties of uricase by insilico analysis.Methods:A total number of sixty amino acid sequences of uricase belongs to different sources were obtained from NCBI and different analysis like Multiple Sequence Alignment (MSA), homology search, phylogenetic relation, motif search, domain architecture and physiochemical properties including pI, EC, Ai, Ii, and were performed.Results:Multiple sequence alignment of all the selected protein sequences has exhibited distinct difference between bacterial, fungal, plant and animal sources based on the position-specific existence of conserved amino acid residues. The maximum homology of all the selected protein sequences is between 51-388. In singular category, homology is between 16-337 for bacterial uricase, 14-339 for fungal uricase, 12-317 for plants uricase, and 37-361 for animals uricase. The phylogenetic tree constructed based on the amino acid sequences disclosed clusters indicating that uricase is from different source. The physiochemical features revealed that the uricase amino acid residues are in between 300- 338 with a molecular weight as 33-39kDa and theoretical pI ranging from 4.95-8.88. The amino acid composition results showed that valine amino acid has a high average frequency of 8.79 percentage compared to different amino acids in all analyzed species.Conclusion:In the area of bioinformatics field, this work might be informative and a stepping-stone to other researchers to get an idea about the physicochemical features, evolutionary history and structural motifs of uricase that can be widely used in biotechnological and pharmaceutical industries. Therefore, the proposed in silico analysis can be considered for protein engineering work, as well as for gout therapy.

Download Full-text

iAFP-gap-SMOTE: An Efficient Feature Extraction Scheme Gapped Dipeptide Composition is Coupled with an Oversampling Technique for Identification of Antifreeze Proteins

Letters in Organic Chemistry ◽

10.2174/1570178615666180816101653 ◽

2019 ◽

Vol 16 (4) ◽

pp. 294-302 ◽

Cited By ~ 6

Author(s):

Shahid Akbar ◽

Maqsood Hayat ◽

Muhammad Kabir ◽

Muhammad Iqbal

Keyword(s):

Feature Extraction ◽

Amino Acid ◽

Antifreeze Proteins ◽

Protein Sequences ◽

Sampling Technique ◽

Lower Class ◽

Success Rates ◽

Throughput Model ◽

Extraction Scheme ◽

Living Organisms

Antifreeze proteins (AFPs) perform distinguishable roles in maintaining homeostatic conditions of living organisms and protect their cell and body from freezing in extremely cold conditions. Owing to high diversity in protein sequences and structures, the discrimination of AFPs from non- AFPs through experimental approaches is expensive and lengthy. It is, therefore, vastly desirable to propose a computational intelligent and high throughput model that truly reflects AFPs quickly and accurately. In a sequel, a new predictor called “iAFP-gap-SMOTE” is proposed for the identification of AFPs. Protein sequences are expressed by adopting three numerical feature extraction schemes namely; Split Amino Acid Composition, G-gap di-peptide Composition and Reduce Amino Acid alphabet composition. Usually, classification hypothesis biased towards majority class in case of the imbalanced dataset. Oversampling technique Synthetic Minority Over-sampling Technique is employed in order to increase the instances of the lower class and control the biasness. 10-fold cross-validation test is applied to appraise the success rates of “iAFP-gap-SMOTE” model. After the empirical investigation, “iAFP-gap-SMOTE” model obtained 95.02% accuracy. The comparison suggested that the accuracy of” iAFP-gap-SMOTE” model is higher than that of the present techniques in the literature so far. It is greatly recommended that our proposed model “iAFP-gap-SMOTE” might be helpful for the research community and academia.

Download Full-text

Fe(2)OG: an integrated HMM profile-based web server to predict and analyze putative non-haem iron(II)- and 2-oxoglutarate-dependent dioxygenase function in protein sequences

BMC Research Notes ◽

10.1186/s13104-021-05477-z ◽

2021 ◽

Vol 14 (1) ◽

Author(s):

Siddhartha Kundu

Keyword(s):

Amino Acid ◽

Water Molecule ◽

Active Site ◽

Ferrous Iron ◽

Web Server ◽

Protein Sequences ◽

Diverse Group ◽

And Function ◽

Functionally Diverse ◽

Haem Iron

Abstract Objective Non-haem iron(II)- and 2-oxoglutarate-dependent dioxygenases (i2OGdd), are a taxonomically and functionally diverse group of enzymes. The active site comprises ferrous iron in a hexa-coordinated distorted octahedron with the apoenzyme, 2-oxoglutarate and a displaceable water molecule. Current information on novel i2OGdd members is sparse and relies on computationally-derived annotation schema. The dissimilar amino acid composition and variable active site geometry thereof, results in differing reaction chemistries amongst i2OGdd members. An additional need of researchers is a curated list of sequences with putative i2OGdd function which can be probed further for empirical data. Results This work reports the implementation of $$Fe\left(2\right)OG$$ F e 2 O G , a web server with dual functionality and an extension of previous work on i2OGdd enzymes $$\left(Fe\left(2\right)OG\equiv \{H2OGpred,DB2OG\}\right)$$ F e 2 O G ≡ { H 2 O G p r e d , D B 2 O G } . $$Fe\left(2\right)OG$$ F e 2 O G , in this form is completely revised, updated (URL, scripts, repository) and will strengthen the knowledge base of investigators on i2OGdd biochemistry and function. $$Fe\left(2\right)OG$$ F e 2 O G , utilizes the superior predictive propensity of HMM-profiles of laboratory validated i2OGdd members to predict probable active site geometries in user-defined protein sequences. $$Fe\left(2\right)OG$$ F e 2 O G , also provides researchers with a pre-compiled list of analyzed and searchable i2OGdd-like sequences, many of which may be clinically relevant. $$Fe(2)OG$$ F e ( 2 ) O G , is freely available (http://204.152.217.16/Fe2OG.html) and supersedes all previous versions, i.e., H2OGpred, DB2OG.

Download Full-text

Cloning, Expression, and Characterization of Mouse Tissue Factor Pathway Inhibitor (TFPI)

Thrombosis and Haemostasis ◽

10.1055/s-0037-1614983 ◽

1998 ◽

Vol 79 (02) ◽

pp. 306-309 ◽

Cited By ~ 5

Author(s):

Dougald Monroe ◽

Julie Oliver ◽

Darla Liles ◽

Harold Roberts ◽

Jen-Yea Chang

Keyword(s):

Amino Acid ◽

Tissue Factor ◽

Signal Peptide ◽

Tissue Factor Pathway Inhibitor ◽

Factor Xa ◽

Protein Sequences ◽

Cloning And Expression ◽

Mouse Tissue ◽

Amino Acid Residues ◽

Tissue Factor Pathway

SummaryTissue factor pathway inhibitor (TFPI) acts to regulate the initiation of coagulation by first inhibiting factor Xa. The complex of factor Xa/ TFPI then inhibits the factor VIIa/tissue factor complex. The cDNA sequences of TFPI from several different species have been previously reported. A high level of similarity is present among TFPIs at the molecular level (DNA and protein sequences) as well as in biochemical function (inhibition of factor Xa, VIIa/tissue factor). In this report, we used a PCR-based screening method to clone cDNA for full length TFPI from a mouse macrophage cDNA library. Both cDNA and predicted protein sequences show significant homology to the other reported TFPI sequences, especially to that of rat. Mouse TFPI has a signal peptide of 28 amino acid residues followed by the mature protein (in which the signal peptide is removed) which has 278 amino acid residues. Mouse TFPI, like that of other species, consists of three tandem Kunitz type domains. Recombinant mouse TFPI was expressed in the human kidney cell line 293 and purified for functional assays. When using human clotting factors to investigate the inhibition spectrum of mouse TFPI, it was shown that, in addition to human factor Xa, mouse TFPI inhibits human factors VIIa, IXa, as well as factor XIa. Cloning and expression of the mouse TFPI gene will offer useful information and material for coagulation studies performed in a mouse model system.

Download Full-text

Genetic Relationships in the Toxin-Producing Fungal Endophyte, Alternaria oxytropis Using Polyketide Synthase and Non-Ribosomal Peptide Synthase Genes

Journal of Fungi ◽

10.3390/jof7070538 ◽

2021 ◽

Vol 7 (7) ◽

pp. 538

Author(s):

Rebecca Creamer ◽

Deana Baucom Hille ◽

Marwa Neyaz ◽

Tesneem Nusayr ◽

Christopher L. Schardl ◽

...

Keyword(s):

Amino Acid ◽

Polyketide Synthase ◽

Genetic Relationships ◽

Protein Sequences ◽

Fungal Endophyte ◽

Melanin Synthesis ◽

Melanin Biosynthesis ◽

Protein Levels ◽

Oxytropis Sericea ◽

And Function

The legume Oxytropis sericea hosts a fungal endophyte, Alternaria oxytropis, which produces secondary metabolites (SM), including the toxin swainsonine. Polyketide synthase (PKS) and non-ribosomal peptide synthase (NRPS) enzymes are associated with biosynthesis of fungal SM. To better understand the origins of the SM, an unannotated genome of A. oxytropis was assessed for protein sequences similar to known PKS and NRPS enzymes of fungi. Contigs exhibiting identity with known genes were analyzed at nucleotide and protein levels using available databases. Software were used to identify PKS and NRPS domains and predict identity and function. Confirmation of sequence for selected gene sequences was accomplished using PCR. Thirteen PKS, 5 NRPS, and 4 PKS-NRPS hybrids were identified and characterized with functions including swainsonine and melanin biosynthesis. Phylogenetic relationships among closest amino acid matches with Alternaria spp. were identified for seven highly conserved PKS and NRPS, including melanin synthesis. Three PKS and NRPS were most closely related to other fungi within the Pleosporaceae family, while five PKS and PKS-NRPS were closely related to fungi in the Pleosporales order. However, seven PKS and PKS-NRPS showed no identity with fungi in the Pleosporales or the class Dothideomycetes, suggesting a different evolutionary origin for those genes.

Download Full-text

Compositional Determinants of Prion Formation in Yeast

Molecular and Cellular Biology ◽

10.1128/mcb.01140-09 ◽

2009 ◽

Vol 30 (1) ◽

pp. 319-332 ◽

Cited By ~ 113

Author(s):

James A. Toombs ◽

Blake R. McCarty ◽

Eric D. Ross

Keyword(s):

Amino Acid ◽

Amino Acid Composition ◽

Amyloid Formation ◽

Yeast Prion ◽

Hydrophobic Residues ◽

Predictive Methods ◽

Infectious Proteins ◽

Β Sheet ◽

Eukaryotic Genomes

ABSTRACT Numerous prions (infectious proteins) have been identified in yeast that result from the conversion of soluble proteins into β-sheet-rich amyloid-like protein aggregates. Yeast prion formation is driven primarily by amino acid composition. However, yeast prion domains are generally lacking in the bulky hydrophobic residues most strongly associated with amyloid formation and are instead enriched in glutamines and asparagines. Glutamine/asparagine-rich domains are thought to be involved in both disease-related and beneficial amyloid formation. These domains are overrepresented in eukaryotic genomes, but predictive methods have not yet been developed to efficiently distinguish between prion and nonprion glutamine/asparagine-rich domains. We have developed a novel in vivo assay to quantitatively assess how composition affects prion formation. Using our results, we have defined the compositional features that promote prion formation, allowing us to accurately distinguish between glutamine/asparagine-rich domains that can form prion-like aggregates and those that cannot. Additionally, our results explain why traditional amyloid prediction algorithms fail to accurately predict amyloid formation by the glutamine/asparagine-rich yeast prion domains.

Download Full-text

Specificity of activated human protein C

Biochemical Journal ◽

10.1042/bj2300497 ◽

1985 ◽

Vol 230 (2) ◽

pp. 497-502 ◽

Cited By ~ 33

Author(s):

S R Stone ◽

J Hofsteenge

Keyword(s):

Amino Acid ◽

Rate Constant ◽

Amino Acid Residue ◽

Protein C ◽

Activated Protein C ◽

Order Rate Constant ◽

Human Protein ◽

Functional Protein ◽

Hydrophobic Residues ◽

Apolar Residue

Peptide p-nitroanilide substrates and peptidylchloromethane inhibitors were used to examine the specificity of activated human Protein C. Substrates with arginine in the P1 position had the highest activity. The best substrates and inhibitors, as judged by the second-order rate constant for their interaction with the enzyme, had an apolar residue in the P2 position. In contrast with thrombin [Kettner & Shaw (1981) Methods Enzymol. 80, 826-842], activated Protein C was able to accommodate large hydrophobic residues such as phenylalanine and leucine in the P2 position. In the P3 position, the enzyme preferred an apolar D-amino acid residue. The results of the present study have also indicated a suitable substrate and inhibitor to be used in the assay of functional protein C and of thrombomodulin.

Download Full-text

Techniques for the verification of minimal phylogenetic trees illustrated with ten mammalian haemoglobin sequences

Biochemical Journal ◽

10.1042/bj1870065 ◽

1980 ◽

Vol 187 (1) ◽

pp. 65-74 ◽

Cited By ~ 12

Author(s):

D Penny ◽

M D Hendy ◽

L R Foulds

Keyword(s):

Amino Acid ◽

Phylogenetic Tree ◽

Protein Sequence ◽

Phylogenetic Trees ◽

Sequence Data ◽

Protein Sequences ◽

Nucleotide Sequences ◽

Amino Acid Sequences ◽

Minimal Tree ◽

Protein Sequence Data

We have recently reported a method to identify the shortest possible phylogenetic tree for a set of protein sequences [Foulds Hendy & Penny (1979) J. Mol. Evol. 13. 127–150; Foulds, Penny & Hendy (1979) J. Mol. Evol. 13, 151–166]. The present paper discusses issues that arise during the construction of minimal phylogenetic trees from protein-sequence data. The conversion of the data from amino acid sequences into nucleotide sequences is shown to be advantageous. A new variation of a method for constructing a minimal tree is presented. Our previous methods have involved first constructing a tree and then either proving that it is minimal or transforming it into a minimal tree. The approach presented in the present paper progressively builds up a tree, taxon by taxon. We illustrate this approach by using it to construct a minimal tree for ten mammalian haemoglobin alpha-chain sequences. Finally we define a measure of the complexity of the data and illustrate a method to derive a directed phylogenetic tree from the minimal tree.

Download Full-text