scholarly journals A Method to Predict Amino Acids at Proximity of Beta-Sheet Axes from Protein Sequences

2014 ◽  
Vol 05 (01) ◽  
pp. 79-89 ◽  
Author(s):  
Antonin Guilloux ◽  
Bernard Caudron ◽  
Jean-Luc Jestin
2020 ◽  
Vol 15 (2) ◽  
pp. 121-134 ◽  
Author(s):  
Eunmi Kwon ◽  
Myeongji Cho ◽  
Hayeon Kim ◽  
Hyeon S. Son

Background: The host tropism determinants of influenza virus, which cause changes in the host range and increase the likelihood of interaction with specific hosts, are critical for understanding the infection and propagation of the virus in diverse host species. Methods: Six types of protein sequences of influenza viral strains isolated from three classes of hosts (avian, human, and swine) were obtained. Random forest, naïve Bayes classification, and knearest neighbor algorithms were used for host classification. The Java language was used for sequence analysis programming and identifying host-specific position markers. Results: A machine learning technique was explored to derive the physicochemical properties of amino acids used in host classification and prediction. HA protein was found to play the most important role in determining host tropism of the influenza virus, and the random forest method yielded the highest accuracy in host prediction. Conserved amino acids that exhibited host-specific differences were also selected and verified, and they were found to be useful position markers for host classification. Finally, ANOVA analysis and post-hoc testing revealed that the physicochemical properties of amino acids, comprising protein sequences combined with position markers, differed significantly among hosts. Conclusion: The host tropism determinants and position markers described in this study can be used in related research to classify, identify, and predict the hosts of influenza viruses that are currently susceptible or likely to be infected in the future.


2020 ◽  
Author(s):  
Sumit Handa ◽  
Andres Reyna ◽  
Timothy Wiryaman ◽  
Partho Ghosh

Abstract Diversity-generating retroelements (DGRs) vary protein sequences to the greatest extent known in the natural world. These elements are encoded by constituents of the human microbiome and the microbial ‘dark matter’. Variation occurs through adenine-mutagenesis, in which genetic information in RNA is reverse transcribed faithfully to cDNA for all template bases but adenine. We investigated the determinants of adenine-mutagenesis in the prototypical Bordetella bacteriophage DGR through an in vitro system composed of the reverse transcriptase bRT, Avd protein, and a specific RNA. We found that the catalytic efficiency for correct incorporation during reverse transcription by the bRT-Avd complex was strikingly low for all template bases, with the lowest occurring for adenine. Misincorporation across a template adenine was only somewhat lower in efficiency than correct incorporation. We found that the C6, but not the N1 or C2, purine substituent was a key determinant of adenine-mutagenesis. bRT-Avd was insensitive to the C6 amine of adenine but recognized the C6 carbonyl of guanine. We also identified two bRT amino acids predicted to nonspecifically contact incoming dNTPs, R74 and I181, as promoters of adenine-mutagenesis. Our results suggest that the overall low catalytic efficiency of bRT-Avd is intimately tied to its ability to carry out adenine-mutagenesis.


2019 ◽  
Vol 21 (1) ◽  
pp. 213
Author(s):  
Federico Norbiato ◽  
Flavio Seno ◽  
Antonio Trovato ◽  
Marco Baiesi

Many native structures of proteins accomodate complex topological motifs such as knots, lassos, and other geometrical entanglements. How proteins can fold quickly even in the presence of such topological obstacles is a debated question in structural biology. Recently, the hypothesis that energetic frustration might be a mechanism to avoid topological frustration has been put forward based on the empirical observation that loops involved in entanglements are stabilized by weak interactions between amino-acids at their extrema. To verify this idea, we use a toy lattice model for the folding of proteins into two almost identical structures, one entangled and one not. As expected, the folding time is longer when random sequences folds into the entangled structure. This holds also under an evolutionary pressure simulated by optimizing the folding time. It turns out that optmized protein sequences in the entangled structure are in fact characterized by frustrated interactions at the closures of entangled loops. This phenomenon is much less enhanced in the control case where the entanglement is not present. Our findings, which are in agreement with experimental observations, corroborate the idea that an evolutionary pressure shapes the folding funnel to avoid topological and kinetic traps.


2019 ◽  
Vol 20 (23) ◽  
pp. 5978 ◽  
Author(s):  
Minkiewicz ◽  
Iwaniak ◽  
Darewicz

The BIOPEP-UWM™ database of bioactive peptides (formerly BIOPEP) has recently become a popular tool in the research on bioactive peptides, especially on these derived from foods and being constituents of diets that prevent development of chronic diseases. The database is continuously updated and modified. The addition of new peptides and the introduction of new information about the existing ones (e.g., chemical codes and references to other databases) is in progress. New opportunities include the possibility of annotating peptides containing D-enantiomers of amino acids, batch processing option, converting amino acid sequences into SMILES code, new quantitative parameters characterizing the presence of bioactive fragments in protein sequences, and finding proteinases that release particular peptides.


2002 ◽  
Vol 10 (04) ◽  
pp. 319-335
Author(s):  
DAVID DIGBY ◽  
WILLIAM SEFFENS ◽  
FISSEHA ABEBE

An in silico study of mRNA secondary structure has found a bias within the coding sequences of genes that favors "in-frame" pairing of nucleotides. This pairing of codons, each with its reverse-complement, partitions the 20 amino acids into three subsets. The genetic code can therefore be represented by a three-component graph. The composition of proteins in terms of amino acid membership in the three subgroups has been measured, and sequence runs of members within the same subgroup have been analyzed using a runs statistic based on Z-scores. In a GENBANK database of over 416,000 protein sequences, the distribution of this runs-test statistic is negatively skewed. To assess whether this statistical bias was due to a chance grouping of the amino acids in the real genetic code, several alternate partitions of the genetic code were examined by permuting the assignment of amino acids to groups. A metric was constructed to define the difference, or "distance", between any two such partitions, and an exhaustive search was conducted among alternate partitions maximally distant from the natural partition of the genetic code, to select sets of partitions that were also maximally distant from one another. The statistical skewness of the runs statistic distribution for native protein sequences were significantly more negative under the natural partition than they were under all of the maximally different partition of codons, although for all partitions, including the natural one, the randomized sequences had quite similar skewness. Hence under the natural graph theory partition of the genetic code there is a preference for more protein sequences to contain fewer runs of amino acids, than they do under the other partitions, meaning that the average run must be longer under the natural partition. This suggests that a corresponding bias may exist in the coding sequences of the actual genes that code for these proteins.


1988 ◽  
Vol 67 (3) ◽  
pp. 543-547 ◽  
Author(s):  
R.R.B. Russell ◽  
T. Shiroza ◽  
H.K. Kuramitsu ◽  
J.J. Ferreti

The sequences of glucosyltransferase genes from Streptococcus sobrinus (gtfI) and Streptococcus mutans (gtfB) were compared and show a high degree of homology. There is a 57.7% homology of nucleotides in the genes and a 56. 7% homology of amino acids in the deduced protein sequences. The G + C content for the protein-coding region is 43.6% for S. sobrinus and 41.2% for S. mutans. Internal repeating sequences present in both proteins exhibit some difference in sequence pattern.


1975 ◽  
Vol 50 (1) ◽  
pp. 161-166
Author(s):  
John A. Black ◽  
Peter Stenzel ◽  
Richard N. Harkins

1989 ◽  
Vol 108 (3) ◽  
pp. 833-842 ◽  
Author(s):  
M S Robinson

Coat proteins of approximately 100-kD (adaptins) are components of the adaptor complexes which link clathrin to receptors in coated vesicles. The alpha-adaptins, which are found exclusively in endocytic coated vesicles, separate into two bands on SDS gels, designated A and C (Robinson, M. S., 1987. J. Cell Biol. 104:887-895). Two distinct cDNAs (sequences 1 and 2) encoding the two alpha-adaptins were cloned from a mouse brain cDNA library. Southern blotting indicates that there is one copy of each of the two alpha-adaptin genes, and that there are no additional closely related genes. Based on the size of the predicted protein products of the two genes (108 and 104 kD), the relative abundance of the two messages in brain and liver, and the reactivity of a sequence 1 fusion protein with different antibodies, it was possible to conclude that sequence 1 codes for A and sequence 2 for C. The two protein sequences are strikingly homologous to each other (84% identical amino acids), the major difference being an additional stretch of 41 amino acids, rich in prolines and acidic residues, inserted into the COOH-terminal half of A. In situ hybridization carried out on mouse brain sections indicates that the same cell type may express both transcripts, but that their relative expressions vary. Antipeptide antibodies are now being raised to find out whether the proteins are localized in functionally distinct populations of endocytic coated vesicles.


Sign in / Sign up

Export Citation Format

Share Document