Nucleic acid composition, codon usage, and the rate of synonymous substitution in protein-coding genes

1.AbstractProtein-coding DNA sequences are thought to primarily affect phenotypes via the peptides they encode. Yet, emerging data suggest that, although they do not affect protein sequences, synonymous mutations can cause phenotypic changes. Previously, we have shown that signatures of selection on gene-specific codons usage bias are common in genomes of diverse eukaryotic species. Thus, synonymous codon usage, just as amino acid usage pattern, is likely a regular target of natural selection. Consequently, here we propose the hypothesis that at least for some protein-coding genes, codon clusters with biased synonymous codon usage patterns might represent “hidden” nucleic-acid-level functional domains that affect the action of the corresponding proteins via diverse hypothetical mechanisms. To test our hypothesis, we used computational approaches to identify over 3,000 putatively functional codon clusters (PFCCs) with biased usage patterns in about 1,500 protein-coding genes in the Drosophila melanogaster genome. Specifically, our data suggest that these PFCCs are likely associated with specific categories of gene function, including enrichment in genes that encode membrane-bound and secreted proteins. Yet, the majority of the PFCCs that we have identified are not associated with previously annotated functional protein domains. Although the specific functional significance of the majority of the PFCCs we have identified remains unknown, we show that in the highly conserved family of voltage-gated sodium channels, the existence of rare-codon cluster(s) in the nucleic-acid region that encodes the cytoplasmic loop that constitutes inactivation gate is conserved across paralogs as well as orthologs across distant animal species. Together, our findings suggest that codon clusters with biased usage patterns likely represent “hidden” nucleic-acid-level functional domains that cannot be simply predicted from the amino acid sequences they encode. Therefore, it is likely that on the evolutionary timescale, protein-coding DNA sequences are shaped by both amino-acid-dependent and codon-usage-dependent selective forces.

Download Full-text

Analysis of codon usage pattern of mitochondrial protein-coding genes in different hookworms

Molecular and Biochemical Parasitology ◽

10.1016/j.molbiopara.2017.11.005 ◽

2018 ◽

Vol 219 ◽

pp. 24-32 ◽

Cited By ~ 6

Author(s):

Bornali Deb ◽

Arif Uddin ◽

Gulshana Akthar Mazumder ◽

Supriyo Chakraborty

Keyword(s):

Codon Usage ◽

Mitochondrial Protein ◽

Codon Usage Pattern ◽

Usage Pattern ◽

Protein Coding ◽

Protein Coding Genes

Download Full-text

The Chloroplast Genomes Comparative Analysis of Taihangia Rupestris and Taihangia Rupestris Var. Ciliate, Two Endangered and Endemic Cliff Plants in Taihang Mountain of China

10.21203/rs.3.rs-892423/v1 ◽

2021 ◽

Author(s):

Yan Zheng ◽

Yuan Jiang ◽

Yujing Miao ◽

Zhan Feng ◽

Min Zhang ◽

...

Keyword(s):

Phylogenetic Analysis ◽

Codon Usage ◽

Sustainable Use ◽

Scientific Basis ◽

Protein Coding ◽

Protein Coding Genes ◽

Repeat Sequences ◽

Chloroplast Genomes ◽

High Degree ◽

Taihang Mountain

Abstract The Taihangia is a native endangered cliff species that grows in the Taihang Mountains in China. The cp genomes with a whole length of 155,558 bp and 155,479 bp for Taihangia rupestris and Taihangia rupestris var. rupestris. They have 131 genes in total, covering 79 protein-coding genes, 29 tRNA, and 4 rRNA. Analyses of codon usage, RNA-editing sites, repeat sequences, and comparison of cp genomes showed a high degree of conservation. Phylogenetic analysis indicated that the Taihangia are closed to the Geum. Taihangia genus was inferred to have originated at 0.2057 Mya, and Geum rupestre was inferred to have originated at 1.4431 Mya. Overall, the gene contents, gene arrangements, the types, and frequency of codon usage, repeat sequences, and SSRs are similar and highly conserved in the species of T. rupestris and T. rupestris var. ciliate. It is found that based on bioprospecting, T. rupestris and T. rupestris var. rupestris are potential medicinal resources. This study provides a scientific basis for the conservation and sustainable use of endangered medicinal resources..

Download Full-text

Theory of prokaryotic genome evolution

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1614083113 ◽

2016 ◽

Vol 113 (41) ◽

pp. 11399-11407 ◽

Cited By ~ 62

Author(s):

Itamar Sela ◽

Yuri I. Wolf ◽

Eugene V. Koonin

Keyword(s):

Genome Size ◽

Prokaryotic Genome ◽

Genetic Material ◽

Purifying Selection ◽

Synonymous Substitution ◽

Protein Coding ◽

Protein Coding Genes ◽

New Genes ◽

Extra Energy ◽

Prokaryotic Genomes

Bacteria and archaea typically possess small genomes that are tightly packed with protein-coding genes. The compactness of prokaryotic genomes is commonly perceived as evidence of adaptive genome streamlining caused by strong purifying selection in large microbial populations. In such populations, even the small cost incurred by nonfunctional DNA because of extra energy and time expenditure is thought to be sufficient for this extra genetic material to be eliminated by selection. However, contrary to the predictions of this model, there exists a consistent, positive correlation between the strength of selection at the protein sequence level, measured as the ratio of nonsynonymous to synonymous substitution rates, and microbial genome size. Here, by fitting the genome size distributions in multiple groups of prokaryotes to predictions of mathematical models of population evolution, we show that only models in which acquisition of additional genes is, on average, slightly beneficial yield a good fit to genomic data. These results suggest that the number of genes in prokaryotic genomes reflects the equilibrium between the benefit of additional genes that diminishes as the genome grows and deletion bias (i.e., the rate of deletion of genetic material being slightly greater than the rate of acquisition). Thus, new genes acquired by microbial genomes, on average, appear to be adaptive. The tight spacing of protein-coding genes likely results from a combination of the deletion bias and purifying selection that efficiently eliminates nonfunctional, noncoding sequences.

Download Full-text

How to calculate the non-synonymous to synonymous rate ratio of protein-coding genes under the Fisher–Wright mutation–selection framework

Biology Letters ◽

10.1098/rsbl.2014.1031 ◽

2015 ◽

Vol 11 (4) ◽

pp. 20141031 ◽

Cited By ~ 12

Author(s):

Mario dos Reis

Keyword(s):

First Principles ◽

Rate Ratio ◽

Real Data ◽

Synonymous Substitution ◽

Chloroplast Gene ◽

Synonymous Substitution Rate ◽

Protein Coding ◽

Protein Coding Genes ◽

Selection Framework ◽

Insight Into

First principles of population genetics are used to obtain formulae relating the non-synonymous to synonymous substitution rate ratio to the selection coefficients acting at codon sites in protein-coding genes. Two theoretical cases are discussed and two examples from real data (a chloroplast gene and a virus polymerase) are given. The formulae give much insight into the dynamics of non-synonymous substitutions and may inform the development of methods to detect adaptive evolution.

Download Full-text

Cryptosporidium felis differs from other Cryptosporidium spp. in codon usage

Microbial Genomics ◽

10.1099/mgen.0.000711 ◽

2021 ◽

Vol 7 (12) ◽

Author(s):

Jiayu Li ◽

Yaqiong Guo ◽

Dawn M. Roellig ◽

Na Li ◽

Yaoyu Feng ◽

...

Keyword(s):

Amino Acids ◽

Natural Selection ◽

Codon Usage ◽

Gc Content ◽

Transport Systems ◽

Protein Coding ◽

Reductive Evolution ◽

Protein Coding Genes ◽

Wide Range ◽

Related Proteins

Cryptosporidium spp. are important enteric pathogens in a wide range of vertebrates including humans. Previous comparative analysis revealed conservation in genome composition, gene content, and gene organization among Cryptosporidium spp., with a progressive reductive evolution in metabolic pathways and invasion-related proteins. In this study, we sequenced the genome of zoonotic pathogen Cryptosporidium felis and conducted a comparative genomic analysis. While most intestinal Cryptosporidium species have similar genomic characteristics and almost complete genome synteny, fewer protein-coding genes and some sequence inversions and translocations were found in the C. felis genome. The C. felis genome exhibits much higher GC content (39.6 %) than other Cryptosporidium species (24.3–32.9 %), especially at the third codon position (GC3) of protein-coding genes. Thus, C. felis has a different codon usage, which increases the use of less energy costly amino acids (Gly and Ala) encoded by GC-rich codons. While the tRNA usage is conserved among Cryptosporidium species, consistent with its higher GC content, C. felis uses a unique tRNA for GTG for valine instead of GTA in other Cryptosporidium species. Both mutational pressures and natural selection are associated with the evolution of the codon usage in Cryptosporidium spp., while natural selection seems to drive the codon usage in C. felis. Other unique features of the C. felis genome include the loss of the entire traditional and alternative electron transport systems and several invasion-related proteins. Thus, the preference for the use of some less energy costly amino acids in C. felis may lead to a more harmonious parasite–host interaction, and the strengthened host-adaptation is reflected by the further reductive evolution of metabolism and host invasion-related proteins.

Download Full-text

Complete Chloroplast Genome Sequences of Clematis: IR Expansion and Relative Rates of Synonymous Substitutions

10.20944/preprints201804.0106.v1 ◽

2018 ◽

Cited By ~ 1

Author(s):

Kyoung Su Choi ◽

Keum Seon Jeong ◽

Young-Ho Ha ◽

Kyung Choi

Keyword(s):

Chloroplast Genome ◽

Tandem Repeats ◽

Single Copy ◽

Synonymous Substitution ◽

Substitution Rates ◽

Protein Coding ◽

Protein Coding Genes ◽

Repeat Structure ◽

Chloroplast Genomes ◽

Synonymous Substitution Rates

Genus Clematis is one of the largest within Ranunculaceae. Here we report the chloroplast genome of two Clematis species, C. brachyura and C. trichotoma endemic to Korea. The chloroplast genome lengths of C. brachyura and C. trichotoma are 159,532 bp and 159,170 bp, respectively. Gene contents in the complete chloroplast genomes of these two Clematis species are identical to that of most Ranunculaceae and other angiosperms. However, our data results demonstrated that genus Clematis has inversion and rearrangement events concerning gene rps4 gene, rps16 to trnH region, and trnL to ndhC region, and IR regions expansion. Comparison of IR regions among Ranunculaceae species revealed that Clematis species contained six protein coding genes (infA, rps8, rpl14, rpl16, rps3, and rpl22) usually found in the long single copy (LSC) region of other species. Phylogenetic analysis demonstrated that genus Clematis is closely related to genus Ranunculus. Differences in repeat structure, substitution rates, and IR expansion in genera Clematis and Ranunculus, explained their relationship. Clematis species showed slightly higher tandem repeats content than Ranunculus species. The six protein-coding genes showed lower synonymous substitution rates in the IR of Clematis species than in the LSC of Ranunculus species. Overall, the chloroplast genomes and results presented here provide important information on the evolution of Ranunculaceae.

Download Full-text

Amino acid composition and the evolutionary rates of protein-coding genes

Journal of Molecular Evolution ◽

10.1007/bf02105805 ◽

1985 ◽

Vol 22 (1) ◽

pp. 53-62 ◽

Cited By ~ 57

Author(s):

Dan Graur

Keyword(s):

Amino Acid ◽

Amino Acid Composition ◽

Acid Composition ◽

Evolutionary Rates ◽

Protein Coding ◽

Protein Coding Genes

Download Full-text

Turdoides affinis mitogenome reveals the translational efficiency and importance of NADH dehydrogenase complex-I in the Leiothrichidae family

Scientific Reports ◽

10.1038/s41598-020-72674-4 ◽

2020 ◽

Vol 10 (1) ◽

Cited By ~ 1

Author(s):

Indrani Sarkar ◽

Prateek Dey ◽

Sanjeev Kumar Sharma ◽

Swapna Devi Ray ◽

Venkata Hanumat Sastry Kochiganti ◽

...

Keyword(s):

De Novo ◽

Purifying Selection ◽

Sister Group ◽

Synonymous Substitution ◽

Translational Efficiency ◽

Evolutionary Analysis ◽

Peninsular India ◽

Protein Coding ◽

Protein Coding Genes ◽

Complete Mitogenome

Abstract Mitochondrial genome provides useful information about species concerning its evolution and phylogenetics. We have taken the advantage of high throughput next-generation sequencing technique to sequence the complete mitogenome of Yellow-billed babbler (Turdoides affinis), a species endemic to Peninsular India and Sri Lanka. Both, reference-based and de-novo assemblies of mitogenome were performed and observed that de-novo assembled mitogenome was most appropriate. The complete mitogenome of yellow-billed babbler (assembled de-novo) was 17,672 bp in length with 53.2% AT composition. Thirteen protein-coding genes along with two rRNAs and 22 tRNAs were detected. The arrangement pattern of these genes was found conserved among Leiothrichidae family mitogenomes. Duplicated control regions were found in the newly sequenced mitogenome. Downstream bioinformatics analysis revealed the effect of translational efficiency and purifying selection pressure over thirteen protein-coding genes in yellow-billed babbler mitogenome. Ka/Ks analysis indicated the highest synonymous substitution rate in the nad6 gene. Evolutionary analysis revealed the conserved nature of all the protein-coding genes across Leiothrichidae family mitogenomes. Our limited phylogeny results placed T. affinis in a separate group, a sister group of Garrulax. Overall, our results provide a useful information for future studies on the evolutionary and adaptive mechanisms of birds belong to the Leiothrichidae family.

Download Full-text

Codon usage and expression level of human mitochondrial 13 protein coding genes across six continents

Mitochondrion ◽

10.1016/j.mito.2017.11.006 ◽

2018 ◽

Vol 42 ◽

pp. 64-76 ◽

Cited By ~ 2

Author(s):

Supriyo Chakraborty ◽

Arif Uddin ◽

Tarikul Huda Mazumder ◽

Monisha Nath Choudhury ◽

Arup Kumar Malakar ◽

...

Keyword(s):

Codon Usage ◽

Expression Level ◽

Protein Coding ◽

Protein Coding Genes

Download Full-text