scholarly journals The Use of GC-, Codon-, and Amino Acid-frequencies to Understand the Evolutionary Forces at a Genomic Scale

2019 ◽  
Author(s):  
Arne Elofsson

1AbstractIt is well known that the GC content varies enormously between organisms; this is believed to be caused by a combination of mutational preferences and selective pressure. Within coding regions, the variation of GC is more substantial in position three and smaller in position one and two. Less well known is that this variation also has an enormous impact on the frequency of amino acids as their codons vary in GC content. For instance, the fraction of alanines in different proteomes varies from 1.1% to 16.5%. In general, the frequency of different amino acids correlates strongly with the number of codons, the GC content of these codons and the genomic GC contents. However, there are clear and systematic deviations from the expected frequencies. Some amino acids are more frequent than expected by chance, while others are less frequent. A plausible model to explain this is that there exist two different selective forces acting on the genes; First, there exists a force acting to maintain the overall GC level and secondly there exists a selective force acting on the amino acid level. Here, we use the divergence in amino acid frequency from what is expected by the GC content to analyze the selective pressure acting on codon frequencies in the three kingdoms of life. We find four major selective forces; First, the frequency of serine is lower than expected in all genomes, but most in prokaryotes. Secondly, there exist a selective pressure acting to balance positively and negatively charged amino acids, which results in a reduction of arginine and negatively charged amino acids. This results in a reduction of arginine and all the negatively charged amino acids. Thirdly, the frequency of the hydrophobic residues encoded by a T in the second codon position does not change with GC. Their frequency is lower in eukaryotes than in prokaryotes. Finally, some amino acids with unique properties, such as proline glycine and proline, are limited in their frequency variation.

2004 ◽  
Vol 186 (18) ◽  
pp. 6277-6285 ◽  
Author(s):  
Jason R. Wickstrum ◽  
Susan M. Egan

ABSTRACT The RhaS and RhaR proteins are transcription activators that respond to the availability of l-rhamnose and activate transcription of the operons in the Escherichia coli l-rhamnose catabolic regulon. RhaR activates transcription of rhaSR, and RhaS activates transcription of the operon that encodes the l-rhamnose catabolic enzymes, rhaBAD, as well as the operon that encodes the l-rhamnose transport protein, rhaT. RhaS is 30% identical to RhaR at the amino acid level, and both are members of the AraC/XylS family of transcription activators. The RhaS and RhaR binding sites overlap the −35 hexamers of the promoters they regulate, suggesting they may contact the σ70 subunit of RNA polymerase as part of their mechanisms of transcription activation. In support of this hypothesis, our lab previously identified an interaction between RhaS residue D241 and σ70 residue R599. In the present study, we first identified two positively charged amino acids in σ70, K593 and R599, and three negatively charged amino acids in RhaR, D276, E284, and D285, that were important for RhaR-mediated transcription activation of the rhaSR operon. Using a genetic loss-of-contact approach we have obtained evidence for a specific contact between RhaR D276 and σ70 R599. Finally, previous results from our lab separately showed that RhaS D250A and σ70 K593A were defective at the rhaBAD promoter. Our genetic loss-of-contact analysis of these residues indicates that they identify a second site of contact between RhaS and σ70.


2004 ◽  
Vol 91 (01) ◽  
pp. 38-42 ◽  
Author(s):  
Christof Geisen ◽  
Erhard Seifried ◽  
Johannes Oldenburg ◽  
Matthias Watzka

SummaryFactorVIII acts as an essential compound of the tenase complex of the coagulation system. Herein we report the cDNA of the rat factor VIII. The rat cDNA comprises 6777 nucleotides and encodes a protein of 2258 amino acids, 61 amino acids less than mouse and 92 amino acids less than human factor VIII. The overall identity compared to human cDNA is 61% on the cDNA and 51% on the amino acid level. In cDNA, highest levels of sequence identity can be observed in the A and C domains (ranging between 68% and 73%), whereas B domain and the small acidic regions are more divergent (34%-49%). Compared to mouse and human most sites for posttranslational modifications such as sulfatation and glycosylation as well as thrombin and protein C cleavage sites are conserved in rat. Alternative transcripts lacking exon 17 and/or comprising additional 26 bp due to alternative splicing of exon 20 were found. Furthermore, 13 polymorphisms (seven in exon 14, one in exon 20, 23, 24, and 25, two in the 3’UTR) three of which lead to an amino acid exchange could be detected. Our findings might provide new insights into the structure-function analysis of the factor VIII protein and might prove useful for future animal models addressing the function of factor VIII.


2002 ◽  
Vol 184 (5) ◽  
pp. 1444-1448 ◽  
Author(s):  
Jayna L. Ditty ◽  
Caroline S. Harwood

ABSTRACT Charged amino acids in the predicted transmembrane portion of PcaK, a permease from Pseudomonas putida that transports 4-hydroxybenzoate (4-HBA), were required for 4-HBA transport, and they were also required for P. putida to have a chemotactic response to 4-HBA. An essential amino acid motif (DGXD) containing aspartate residues is located in the first transmembrane segment of PcaK and is conserved in the aromatic acid/H+ symporter family of the major facilitator superfamily of transporters.


1959 ◽  
Vol 197 (4) ◽  
pp. 873-879 ◽  
Author(s):  
Roland A. Coulson ◽  
Thomas Hernandez

The rate of renal deamination of 18 amino acids was determined by injecting them into alligators and measuring the ammonia excreted. Not only did glycine, alanine, glutamine and leucine account for nearly half of the plasma amino acids, they were also deaminated more rapidly than any of the others. In view of this it was concluded that these four amino acids are the natural precursors of urinary NH3 in the alligator. Increased NH3 and CO2 excretion following glycine injections resulted in increased renal reabsorption of Na and Cl when NaCl was injected and increased Na reabsorption when NaHCO3 or Na phosphate solutions were injected. The fact that excess NH4HCO3 excretion enhances salt reabsorption independent of plasma pH makes it probable that the excretion of N is the chief function of the ammonia mechanism and that salt conservation is incidental. Insulin decreased the plasma amino acid level and drastically reduced the NH3 excretion. With the decrease in ammonia, NaCl and NaHCO3 were excreted in increased amounts.


2020 ◽  
Vol 21 (1) ◽  
Author(s):  
Kai-Yao Huang ◽  
Fang-Yu Hung ◽  
Hui-Ju Kao ◽  
Hui-Hsuan Lau ◽  
Shun-Long Weng

Abstract Background Protein phosphoglycerylation, the addition of a 1,3-bisphosphoglyceric acid (1,3-BPG) to a lysine residue of a protein and thus to form a 3-phosphoglyceryl-lysine, is a reversible and non-enzymatic post-translational modification (PTM) and plays a regulatory role in glucose metabolism and glycolytic process. As the number of experimentally verified phosphoglycerylated sites has increased significantly, statistical or machine learning methods are imperative for investigating the characteristics of phosphoglycerylation sites. Currently, research into phosphoglycerylation is very limited, and only a few resources are available for the computational identification of phosphoglycerylation sites. Result We present a bioinformatics investigation of phosphoglycerylation sites based on sequence-based features. The TwoSampleLogo analysis reveals that the regions surrounding the phosphoglycerylation sites contain a high relatively of positively charged amino acids, especially in the upstream flanking region. Additionally, the non-polar and aliphatic amino acids are more abundant surrounding phosphoglycerylated lysine following the results of PTM-Logo, which may play a functional role in discriminating between phosphoglycerylation and non-phosphoglycerylation sites. Many types of features were adopted to build the prediction model on the training dataset, including amino acid composition, amino acid pair composition, positional weighted matrix and position-specific scoring matrix. Further, to improve the predictive power, numerous top features ranked by F-score were considered as the final combination for classification, and thus the predictive models were trained using DT, RF and SVM classifiers. Evaluation by five-fold cross-validation showed that the selected features was most effective in discriminating between phosphoglycerylated and non-phosphoglycerylated sites. Conclusion The SVM model trained with the selected sequence-based features performed well, with a sensitivity of 77.5%, a specificity of 73.6%, an accuracy of 74.9%, and a Matthews Correlation Coefficient value of 0.49. Furthermore, the model also consistently provides the effective performance in independent testing set, yielding sensitivity of 75.7% and specificity of 64.9%. Finally, the model has been implemented as a web-based system, namely iDPGK, which is now freely available at http://mer.hc.mmh.org.tw/iDPGK/.


1990 ◽  
Vol 45 (5) ◽  
pp. 538-543 ◽  
Author(s):  
D. Friedberg ◽  
J. Seijffers

We present here the isolation and molecular characterization of acetolactate synthase (ALS) genes from the cyanobacterium Synechococcus PCC7942 which specify a sulfonylurea-sensitive enzyme and from the sulfonylurea-resistant mutant SM3/20, which specify resistance to sulfonylurea herbicides. The ALS gene was cloned and mapped by complementation of an Escherichia coli ilv auxotroph that requires branched-chain amino acids for growth and lacks ALS activity. The cyanobacterial gene is efficiently expressed in this heterologous host. The ALS gene codes for 612 amino acids and shows high sequence homology (46%) at the amino acid level with ALS III of E. coli and with the tobacco ALS. The resistant phenotype is a consequence of proline to serine substitution in residue 115 of the deduced amino acid sequence. Functional expression of the mutant gene in wild-type Synechococcus and in E. coli confirmed that this amino-acid substitution is responsible for the resistance. Yet the deduced amino-acid sequence as compared with othjer ALS proteins supports the notion that the amino-acid context of the substitution is important for the resistance.


1996 ◽  
Vol 315 (3) ◽  
pp. 807-814 ◽  
Author(s):  
Said MODARESSI ◽  
Bruno CHRIST ◽  
Jutta BRATKE ◽  
Stefan ZAHN ◽  
Tilman HEISE ◽  
...  

In human liver, phosphoenolpyruvate carboxykinase (PCK; EC 4.1.1.32) is about equally distributed between cytosol and mitochondria in contrast with rat liver in which it is essentially a cytosolic enzyme. Recently, the isolation of the gene and cDNA of the human cytosolic enzyme has been reported [Ting, Burgess, Chamberlain, Keith, Falls and Meisler (1993) Genomics 16, 698–706; Stoffel, Xiang, Espinosa, Cox, Le Beau and Bell (1993) Hum. Mol. Genet. 2, 1–4]. It was the goal of this investigation to isolate the cDNA of the human mitochondrial form of hepatic PCK. A human liver cDNA library was screened with a rat cytosolic PCK cDNA probe comprising sequences from exons 2 to 9. A cDNA clone was isolated which had overall a 68% DNA sequence and a 70% deduced amino acid sequence identity with the human cytosolic PCK cDNA. Without the flanking 270 bases (=90 amino acids) each at the 5´ and 3´ end, the sequence identity was 73% on the DNA and 78% on the amino acid level. The isolated cDNA had an open reading frame of 1920 bp; it was 54 bp (equivalent to 18 amino acids) longer than that of human or rat cytosolic PCK cDNA. The isolated cDNA was cloned into the eukaryotic expression vector pcDNAI and transfected into human embryonal kidney cells HEK293; PCK activity was increased by 3-fold in the mitochondria, which normally contain 70% of total PCK activity, but not in the cytosol. The isolated cDNA was also transfected into cultured rat hepatocytes; again, PCK activity was enhanced by about 40-fold in the mitochondria, which normally possess only 10% of total PCK activity, but not in the cytosol. In the rat hepatocytes only the endogenous cytosolic PCK and not the transfected mitochondrial PCK was induced 3-fold with glucagon. Comparison of the amino acid sequences deduced from the isolated cDNA with human and rat cytosolic PCK showed that the additional 18 amino acids were located at the N-terminus of the protein and probably constitute a mitochondrial targeting signal. Northern-blot analyses revealed the human mitochondrial PCK mRNA to be 2.25 kb long, about 0.6 kb shorter than the mRNA of the cytosolic PCK. Primer extension experiments showed that the 5´-untranslated region of mitochondrial PCK mRNA was 134 nucleotides in length.


2004 ◽  
Vol 844 ◽  
Author(s):  
Dinesh R. Katti ◽  
Pijush Ghosh ◽  
Kalpana Katti

AbstractIn the area of clay-polymer nanocomposites, recently montmorillonite is extensively used because of its unique characteristics of swelling. In this work, steered molecular dynamics is used to evaluate the mechanical behavior of a new class of nanocomposites, using amino acids to intercalate clay interlayers. Two positively charged amino acids, lysine and arginine, are used here. Our simulation indicates that both the amino acids have preferred orientation inside the clay interlayer. Our simulations also indicate that the clay-amino acid interlayer is about three times stiffer under tension as compared to under compression. On the other hand, dry montmorillonite shows similar stiffness under tension and compression. The fundamental mechanism of deformation during tension and compression is intrinsically different in the amino acid-clay composite. The stress-strain behavior of this clay-amino acid interlayer is predominantly linear until a stress of 1.5 GPa. This study is a first step towards the potential use of biomacromolecules as modifiers in clay nanocomposites.


PLoS ONE ◽  
2021 ◽  
Vol 16 (6) ◽  
pp. e0253260
Author(s):  
Osamu Kajikawa ◽  
Raquel Herrero ◽  
Yu-Hua Chow ◽  
Chi F. Hung ◽  
Gustavo Matute-Bello

We have previously reported that the 26-amino acid N-terminus stalk region of soluble Fas ligand (sFasL), which is separate from its binding site, is required for its biological function. Here we investigate the mechanisms that link the structure of the sFasL stalk region with its function. Using site-directed mutagenesis we cloned a mutant form of sFasL in which all the charged amino acids of the stalk region were changed to neutral alanines (mut-sFasL). We used the Fas-sensitive Jurkat T-cell line and mouse and human alveolar epithelial cells to test the bioactivity of sFasL complexes, using caspase-3 activity and Annexin-V externalization as readouts. Finally, we tested the effects of mut-sFasL on lipopolysaccharide-induced lung injury in mice. We found that mutation of all the 8 charged amino acids of the stalk region into the non-charged amino acid alanine (mut-sFasL) resulted in reduced apoptotic activity compared to wild type sFasL (WT-sFasL). The mut-sFasL attenuated WT-sFasL function on the Fas-sensitive human T-cell line Jurkat and on primary human small airway epithelial cells. The inhibitory mechanism was associated with the formation of complexes of mut-sFasL with the WT protein. Intratracheal administration of the mut-sFasL to mice 24 hours after intratracheal Escherichia coli lipopolysaccharide resulted in attenuation of the inflammatory response 24 hours later. Therefore, the stalk region of sFasL has a critical role on bioactivity, and changes in the structure of the stalk region can result in mutant variants that interfere with the wild type protein function in vitro and in vivo.


2016 ◽  
Author(s):  
Guang-Zhong Wang

AbstractThe transcriptional and translational systems are essentially information processing systems. However, how to quantify the amount of information decoded during expression remains a mystery. Here, we have proposed a simple method to evaluate the amount of information transcribed and translated during gene expression. We found that although proteins with a high copy number have more information translated, the average number of bits per amino acid is not high. The negative correlation between protein copy number and bits per amino acid indicates the selective pressure to reduce translational errors. Moreover, interacting proteins have similar bits per residue translated. All of these findings highlight the importance of understanding transcription and translation from an information processing perspective.


Sign in / Sign up

Export Citation Format

Share Document