A Method to Predict Amino Acids at Proximity of Beta-Sheet Axes from Protein Sequences

Antonin Guilloux; Bernard Caudron; Jean-Luc Jestin

doi:10.4236/am.2014.51009

A Study on Host Tropism Determinants of Influenza Virus Using Machine Learning

Current Bioinformatics ◽

10.2174/1574893614666191104160927 ◽

2020 ◽

Vol 15 (2) ◽

pp. 121-134 ◽

Cited By ~ 2

Author(s):

Eunmi Kwon ◽

Myeongji Cho ◽

Hayeon Kim ◽

Hyeon S. Son

Keyword(s):

Machine Learning ◽

Amino Acids ◽

Influenza Virus ◽

Random Forest ◽

Physicochemical Properties ◽

Protein Sequences ◽

Influenza Viruses ◽

Host Tropism ◽

Post Hoc ◽

Ha Protein

Background: The host tropism determinants of influenza virus, which cause changes in the host range and increase the likelihood of interaction with specific hosts, are critical for understanding the infection and propagation of the virus in diverse host species. Methods: Six types of protein sequences of influenza viral strains isolated from three classes of hosts (avian, human, and swine) were obtained. Random forest, naïve Bayes classification, and knearest neighbor algorithms were used for host classification. The Java language was used for sequence analysis programming and identifying host-specific position markers. Results: A machine learning technique was explored to derive the physicochemical properties of amino acids used in host classification and prediction. HA protein was found to play the most important role in determining host tropism of the influenza virus, and the random forest method yielded the highest accuracy in host prediction. Conserved amino acids that exhibited host-specific differences were also selected and verified, and they were found to be useful position markers for host classification. Finally, ANOVA analysis and post-hoc testing revealed that the physicochemical properties of amino acids, comprising protein sequences combined with position markers, differed significantly among hosts. Conclusion: The host tropism determinants and position markers described in this study can be used in related research to classify, identify, and predict the hosts of influenza viruses that are currently susceptible or likely to be infected in the future.

Download Full-text

Determinants of adenine-mutagenesis in diversity-generating retroelements

Nucleic Acids Research ◽

10.1093/nar/gkaa1240 ◽

2020 ◽

Author(s):

Sumit Handa ◽

Andres Reyna ◽

Timothy Wiryaman ◽

Partho Ghosh

Keyword(s):

Amino Acids ◽

Dark Matter ◽

Reverse Transcription ◽

Genetic Information ◽

Human Microbiome ◽

Protein Sequences ◽

Catalytic Efficiency ◽

Natural World ◽

In Vitro System

Abstract Diversity-generating retroelements (DGRs) vary protein sequences to the greatest extent known in the natural world. These elements are encoded by constituents of the human microbiome and the microbial ‘dark matter’. Variation occurs through adenine-mutagenesis, in which genetic information in RNA is reverse transcribed faithfully to cDNA for all template bases but adenine. We investigated the determinants of adenine-mutagenesis in the prototypical Bordetella bacteriophage DGR through an in vitro system composed of the reverse transcriptase bRT, Avd protein, and a specific RNA. We found that the catalytic efficiency for correct incorporation during reverse transcription by the bRT-Avd complex was strikingly low for all template bases, with the lowest occurring for adenine. Misincorporation across a template adenine was only somewhat lower in efficiency than correct incorporation. We found that the C6, but not the N1 or C2, purine substituent was a key determinant of adenine-mutagenesis. bRT-Avd was insensitive to the C6 amine of adenine but recognized the C6 carbonyl of guanine. We also identified two bRT amino acids predicted to nonspecifically contact incoming dNTPs, R74 and I181, as promoters of adenine-mutagenesis. Our results suggest that the overall low catalytic efficiency of bRT-Avd is intimately tied to its ability to carry out adenine-mutagenesis.

Download Full-text

Folding Rate Optimization Promotes Frustrated Interactions in Entangled Protein Structures

International Journal of Molecular Sciences ◽

10.3390/ijms21010213 ◽

2019 ◽

Vol 21 (1) ◽

pp. 213

Author(s):

Federico Norbiato ◽

Flavio Seno ◽

Antonio Trovato ◽

Marco Baiesi

Keyword(s):

Amino Acids ◽

Structural Biology ◽

Weak Interactions ◽

Protein Structures ◽

Protein Sequences ◽

Control Case ◽

Folding Rate ◽

Empirical Observation ◽

Rate Optimization ◽

Kinetic Traps

Many native structures of proteins accomodate complex topological motifs such as knots, lassos, and other geometrical entanglements. How proteins can fold quickly even in the presence of such topological obstacles is a debated question in structural biology. Recently, the hypothesis that energetic frustration might be a mechanism to avoid topological frustration has been put forward based on the empirical observation that loops involved in entanglements are stabilized by weak interactions between amino-acids at their extrema. To verify this idea, we use a toy lattice model for the folding of proteins into two almost identical structures, one entangled and one not. As expected, the folding time is longer when random sequences folds into the entangled structure. This holds also under an evolutionary pressure simulated by optimizing the folding time. It turns out that optmized protein sequences in the entangled structure are in fact characterized by frustrated interactions at the closures of entangled loops. This phenomenon is much less enhanced in the control case where the entanglement is not present. Our findings, which are in agreement with experimental observations, corroborate the idea that an evolutionary pressure shapes the folding funnel to avoid topological and kinetic traps.

Download Full-text

BIOPEP-UWM Database of Bioactive Peptides: Current Opportunities

International Journal of Molecular Sciences ◽

10.3390/ijms20235978 ◽

2019 ◽

Vol 20 (23) ◽

pp. 5978 ◽

Cited By ~ 49

Author(s):

Minkiewicz ◽

Iwaniak ◽

Darewicz

Keyword(s):

Amino Acids ◽

Amino Acid ◽

Chronic Diseases ◽

Bioactive Peptides ◽

Protein Sequences ◽

Batch Processing ◽

Amino Acid Sequences ◽

Quantitative Parameters ◽

New Information

The BIOPEP-UWM™ database of bioactive peptides (formerly BIOPEP) has recently become a popular tool in the research on bioactive peptides, especially on these derived from foods and being constituents of diets that prevent development of chronic diseases. The database is continuously updated and modified. The addition of new peptides and the introduction of new information about the existing ones (e.g., chemical codes and references to other databases) is in progress. New opportunities include the possibility of annotating peptides containing D-enantiomers of amino acids, batch processing option, converting amino acid sequences into SMILES code, new quantitative parameters characterizing the presence of bioactive fragments in protein sequences, and finding proteinases that release particular peptides.

Download Full-text

Decoding the Design Principles of Amino Acids and the Chemical Logic of Protein Sequences

Nature Precedings ◽

10.1038/npre.2008.2135.1 ◽

2008 ◽

Cited By ~ 2

Author(s):

B. Jayaram

Keyword(s):

Amino Acids ◽

Protein Sequences ◽

Design Principles

Download Full-text

RUNS OF AMINO ACIDS ARE LONGER THAN EXPECTED IN PROTEINS BASED ON A GRAPH THEORY REPRESENTATION OF THE GENETIC CODE

Journal of Biological System ◽

10.1142/s0218339002000718 ◽

2002 ◽

Vol 10 (04) ◽

pp. 319-335

Author(s):

DAVID DIGBY ◽

WILLIAM SEFFENS ◽

FISSEHA ABEBE

Keyword(s):

Amino Acids ◽

Graph Theory ◽

Genetic Code ◽

Protein Sequences ◽

Test Statistic ◽

Coding Sequences ◽

Reverse Complement ◽

In Silico Study ◽

Z Scores ◽

The Difference

An in silico study of mRNA secondary structure has found a bias within the coding sequences of genes that favors "in-frame" pairing of nucleotides. This pairing of codons, each with its reverse-complement, partitions the 20 amino acids into three subsets. The genetic code can therefore be represented by a three-component graph. The composition of proteins in terms of amino acid membership in the three subgroups has been measured, and sequence runs of members within the same subgroup have been analyzed using a runs statistic based on Z-scores. In a GENBANK database of over 416,000 protein sequences, the distribution of this runs-test statistic is negatively skewed. To assess whether this statistical bias was due to a chance grouping of the amino acids in the real genetic code, several alternate partitions of the genetic code were examined by permuting the assignment of amino acids to groups. A metric was constructed to define the difference, or "distance", between any two such partitions, and an exhaustive search was conducted among alternate partitions maximally distant from the natural partition of the genetic code, to select sets of partitions that were also maximally distant from one another. The statistical skewness of the runs statistic distribution for native protein sequences were significantly more negative under the natural partition than they were under all of the maximally different partition of codons, although for all partitions, including the natural one, the randomized sequences had quite similar skewness. Hence under the natural graph theory partition of the genetic code there is a preference for more protein sequences to contain fewer runs of amino acids, than they do under the other partitions, meaning that the average run must be longer under the natural partition. This suggests that a corresponding bias may exist in the coding sequences of the actual genes that code for these proteins.

Download Full-text

An Alignment-Free Algorithm in Comparing the Similarity of Protein Sequences Based on Pseudo-Markov Transition Probabilities among Amino Acids

PLoS ONE ◽

10.1371/journal.pone.0167430 ◽

2016 ◽

Vol 11 (12) ◽

pp. e0167430 ◽

Cited By ~ 2

Author(s):

Yushuang Li ◽

Tian Song ◽

Jiasheng Yang ◽

Yi Zhang ◽

Jialiang Yang

Keyword(s):

Amino Acids ◽

Transition Probabilities ◽

Protein Sequences ◽

Alignment Free ◽

Markov Transition

Download Full-text

Homology of Glucosyltransferase Gene and Protein Sequences from Streptococcus sobrinus and Streptococcus mutans

Journal of Dental Research ◽

10.1177/00220345880670030401 ◽

1988 ◽

Vol 67 (3) ◽

pp. 543-547 ◽

Cited By ~ 21

Author(s):

R.R.B. Russell ◽

T. Shiroza ◽

H.K. Kuramitsu ◽

J.J. Ferreti

Keyword(s):

Amino Acids ◽

Streptococcus Mutans ◽

Protein Sequences ◽

Coding Region ◽

Streptococcus Sobrinus ◽

Sequence Pattern ◽

Protein Coding ◽

High Degree

The sequences of glucosyltransferase genes from Streptococcus sobrinus (gtfI) and Streptococcus mutans (gtfB) were compared and show a high degree of homology. There is a 57.7% homology of nucleotides in the genes and a 56. 7% homology of amino acids in the deduced protein sequences. The G + C content for the protein-coding region is 43.6% for S. sobrinus and 41.2% for S. mutans. Internal repeating sequences present in both proteins exhibit some difference in sequence pattern.

Download Full-text

The terminal amino acids of protein sequences and protein maturation

Journal of Theoretical Biology ◽

10.1016/0022-5193(75)90030-2 ◽

1975 ◽

Vol 50 (1) ◽

pp. 161-166

Author(s):

John A. Black ◽

Peter Stenzel ◽

Richard N. Harkins

Keyword(s):

Amino Acids ◽

Protein Sequences ◽

Protein Maturation ◽

Terminal Amino

Download Full-text

Cloning of cDNAs encoding two related 100-kD coated vesicle proteins (alpha-adaptins).

The Journal of Cell Biology ◽

10.1083/jcb.108.3.833 ◽

1989 ◽

Vol 108 (3) ◽

pp. 833-842 ◽

Cited By ~ 96

Author(s):

M S Robinson

Keyword(s):

Amino Acids ◽

Mouse Brain ◽

Southern Blotting ◽

Protein Sequences ◽

Coated Vesicles ◽

Coated Vesicle ◽

Coat Proteins ◽

Cell Type ◽

Brain Cdna Library

Coat proteins of approximately 100-kD (adaptins) are components of the adaptor complexes which link clathrin to receptors in coated vesicles. The alpha-adaptins, which are found exclusively in endocytic coated vesicles, separate into two bands on SDS gels, designated A and C (Robinson, M. S., 1987. J. Cell Biol. 104:887-895). Two distinct cDNAs (sequences 1 and 2) encoding the two alpha-adaptins were cloned from a mouse brain cDNA library. Southern blotting indicates that there is one copy of each of the two alpha-adaptin genes, and that there are no additional closely related genes. Based on the size of the predicted protein products of the two genes (108 and 104 kD), the relative abundance of the two messages in brain and liver, and the reactivity of a sequence 1 fusion protein with different antibodies, it was possible to conclude that sequence 1 codes for A and sequence 2 for C. The two protein sequences are strikingly homologous to each other (84% identical amino acids), the major difference being an additional stretch of 41 amino acids, rich in prolines and acidic residues, inserted into the COOH-terminal half of A. In situ hybridization carried out on mouse brain sections indicates that the same cell type may express both transcripts, but that their relative expressions vary. Antipeptide antibodies are now being raised to find out whether the proteins are localized in functionally distinct populations of endocytic coated vesicles.

Download Full-text