COUSIN (COdon Usage Similarity INdex): A normalized measure of Codon Usage Preferences

Mapping Intimacies ◽

10.1101/600361 ◽

2019 ◽

Author(s):

Jérôme Bourret ◽

Samuel Alizon ◽

Ignacio G. Bravo

Keyword(s):

Codon Usage ◽

Similarity Index ◽

Bimodal Distribution ◽

Nucleotide Composition ◽

Genomic Region ◽

Reference Dataset ◽

Precise Location ◽

Synonymous Codons ◽

Local Use ◽

Genome Scale

AbstractCodon Usage Preferences (CUPrefs) describe the unequal usage of synonymous codons at the gene, genomic region or genome scale. Numerous indices have been developed to measure the CUPrefs of a sequence. We introduce a normalized index to calculate CUPrefs called COUSIN for COdon Usage Similarity INdex. This index compares the CUPrefs of a query against those of a reference dataset and normalizes the output over a Null Hypothesis of random codon usage. COUSIN results can be easily interpreted, quantitatively and qualitatively. We exemplify the use of COUSIN and highlight its advantages with an analysis on the complete coding sequences of eight divergent genomes, two of them with extreme nucleotide composition. Strikingly, COUSIN captures a hitherto unreported bimodal distribution in CUPrefs in genes in the human and in the chicken genomes. We show that this bimodality can be explained by the global nucleotide composition bias of the chromosome in which the gene resides, and by the precise location within the chromosome. Our results highlight the power of the COUSIN index and uncover unexpected characteristics of the CUPrefs in human and chicken. An eponymous tool written in python3 to calculate COUSIN is available for online or local use.

Download Full-text

COUSIN (COdon Usage Similarity INdex): A Normalized Measure of Codon Usage Preferences

Genome Biology and Evolution ◽

10.1093/gbe/evz262 ◽

2019 ◽

Vol 11 (12) ◽

pp. 3523-3528 ◽

Cited By ~ 7

Author(s):

Jérôme Bourret ◽

Samuel Alizon ◽

Ignacio G Bravo

Keyword(s):

Gene Expression ◽

Codon Usage ◽

Null Hypothesis ◽

Similarity Index ◽

Bimodal Distribution ◽

Added Value ◽

Complete Analysis ◽

Coding Sequences ◽

Synonymous Codons ◽

User Friendly

Abstract Codon Usage Preferences (CUPrefs) describe the unequal usage of synonymous codons at the gene, chromosome, or genome levels. Numerous indices have been developed to evaluate CUPrefs, either in absolute terms or with respect to a reference. We introduce the normalized index COUSIN (for COdon Usage Similarity INdex), that compares the CUPrefs of a query against those of a reference and normalizes the output over a Null Hypothesis of random codon usage. The added value of COUSIN is to be easily interpreted, both quantitatively and qualitatively. An eponymous software written in Python3 is available for local or online use (http://cousin.ird.fr). This software allows for an easy and complete analysis of CUPrefs via COUSIN, includes seven other indices, and provides additional features such as statistical analyses, clustering, and CUPrefs optimization for gene expression. We illustrate the flexibility of COUSIN and highlight its advantages by analyzing the complete coding sequences of eight divergent genomes. Strikingly, COUSIN captures a bimodal distribution in the CUPrefs of human and chicken genes hitherto unreported with such precision. COUSIN opens new perspectives to uncover CUPrefs specificities in genomes in a practical, informative, and user-friendly way.

Download Full-text

Codon Usage Bias Covaries With Expression Breadth and the Rate of Synonymous Evolution in Humans, but This Is Not Evidence for Selection

Genetics ◽

10.1093/genetics/159.3.1191 ◽

2001 ◽

Vol 159 (3) ◽

pp. 1191-1199

Author(s):

Araxi O Urrutia ◽

Laurence D Hurst

Keyword(s):

Codon Usage ◽

Codon Bias ◽

Synonymous Codon ◽

Nucleotide Composition ◽

Synonymous Codon Usage ◽

Synonymous Substitutions ◽

Numerous Species ◽

Nucleotide Content ◽

Expression Breadth ◽

Human Genes

Abstract In numerous species, from bacteria to Drosophila, evidence suggests that selection acts even on synonymous codon usage: codon bias is greater in more abundantly expressed genes, the rate of synonymous evolution is lower in genes with greater codon bias, and there is consistency between genes in the same species in which codons are preferred. In contrast, in mammals, while nonequal use of alternative codons is observed, the bias is attributed to the background variance in nucleotide concentrations, reflected in the similar nucleotide composition of flanking noncoding and exonic third sites. However, a systematic examination of the covariants of codon usage controlling for background nucleotide content has yet to be performed. Here we present a new method to measure codon bias that corrects for background nucleotide content and apply this to 2396 human genes. Nearly all (99%) exhibit a higher amount of codon bias than expected by chance. The patterns associated with selectively driven codon bias are weakly recovered: Broadly expressed genes have a higher level of bias than do tissue-specific genes, the bias is higher for genes with lower rates of synonymous substitutions, and certain codons are repeatedly preferred. However, while these patterns are suggestive, the first two patterns appear to be methodological artifacts. The last pattern reflects in part biases in usage of nucleotide pairs. We conclude that we find no evidence for selection on codon usage in humans.

Download Full-text

The effect of expression levels on codon usage inPlasmodium falciparum

Parasitology ◽

10.1017/s0031182003004517 ◽

2004 ◽

Vol 128 (3) ◽

pp. 245-251 ◽

Cited By ~ 26

Author(s):

L. PEIXOTO ◽

V. FERNÁNDEZ ◽

H. MUSTO

Keyword(s):

Amino Acids ◽

Plasmodium Falciparum ◽

Natural Selection ◽

Codon Usage ◽

Complete Sequence ◽

Expression Data ◽

Expression Levels ◽

Synonymous Codons ◽

Translational Selection ◽

Highly Expressed Genes

The usage of alternative synonymous codons in the completely sequenced, extremely A+T-rich parasitePlasmodium falciparumwas studied. Confirming previous studies obtained with less than 3% of the total genes recently described, we found that A- and U-ending triplets predominate but translational selection increases the frequency of a subset of codons in highly expressed genes. However, some new results come from the analysis of the complete sequence. First, there is more variation in GC3 than previously described; second, the effect of natural selection acting at the level of translation has been analysed with real expression data at 4 different stages and third, we found that highly expressed proteins increment the frequency of energetically less expensive amino acids. The implications of these results are discussed.

Download Full-text

Comparative analysis of codon usage patterns in SARS-CoV-2, its mutants and other respiratory viruses

10.1101/2021.03.03.433699 ◽

2021 ◽

Author(s):

Neetu Tyagi ◽

Rahila Sardar ◽

Dinesh Gupta

Keyword(s):

Codon Usage ◽

Codon Usage Bias ◽

Gc Content ◽

Respiratory Illness ◽

Respiratory Viruses ◽

Nucleotide Composition ◽

Health Crisis ◽

Study Results ◽

Usage Patterns ◽

The Difference

AbstractThe Coronavirus disease 2019 (COVID-19) outbreak caused by Severe Acute Respiratory Syndrome Coronavirus 2 virus (SARS-CoV-2) poses a worldwide human health crisis, causing respiratory illness with a high mortality rate. To investigate the factors governing codon usage bias in all the respiratory viruses, including SARS-CoV-2 isolates from different geographical locations (~62K), including two recently emerging strains from the United Kingdom (UK), i.e., VUI202012/01 and South Africa (SA), i.e., 501.Y.V2 codon usage bias (CUBs) analysis was performed. The analysis includes RSCU analysis, GC content calculation, ENC analysis, dinucleotide frequency and neutrality plot analysis. We were motivated to conduct the study to fulfil two primary aims: first, to identify the difference in codon usage bias amongst all SARS-CoV-2 genomes and, secondly, to compare their CUBs properties with other respiratory viruses. A biased nucleotide composition was found as most of the highly preferred codons were A/U-ending in all the respiratory viruses studied here. Compared with the human host, the RSCU analysis led to the identification of 11 over-represented codons and 9 under-represented codons in SARS-CoV-2 genomes. Correlation analysis of ENC and GC3s revealed that mutational pressure is the leading force determining the CUBs. The present study results yield a better understanding of codon usage preferences for SARS-CoV-2 genomes and discover the possible evolutionary determinants responsible for the biases found among the respiratory viruses, thus unveils a unique feature of the SARS-CoV-2 evolution and adaptation. To the best of our knowledge, this is the first attempt at comparative CUBs analysis on the worldwide genomes of SARS-CoV-2, including novel emerged strains and other respiratory viruses.

Download Full-text

Accounting for Background Nucleotide Composition When Measuring Codon Usage Bias

Molecular Biology and Evolution ◽

10.1093/oxfordjournals.molbev.a004201 ◽

2002 ◽

Vol 19 (8) ◽

pp. 1390-1394 ◽

Cited By ~ 176

Author(s):

John A. Novembre

Keyword(s):

Codon Usage ◽

Codon Usage Bias ◽

Nucleotide Composition

Download Full-text

Codon usage bias creates a ramp of hydrogen bonding at the 5′-end in prokaryotic ORFeomes

10.1101/811612 ◽

2019 ◽

Author(s):

Juan C. Villada ◽

Maria F. Duran ◽

Patrick K. H. Lee

Keyword(s):

Hydrogen Bonding ◽

Codon Usage ◽

Codon Usage Bias ◽

Translation Efficiency ◽

Molecular Processes ◽

Molecular Feature ◽

Web Based ◽

Synonymous Codons ◽

Double Stranded Dna ◽

Codon Positions

Codon usage bias exerts control over a wide variety of molecular processes. The positioning of synonymous codons within coding sequences (CDSs) dictates protein expression by mechanisms such as local translation efficiency, mRNA Gibbs free energy, and protein co-translational folding. In this work, we explore how codon variants affect the position-dependent content of hydrogen bonding, which in turn influences energy requirements for unwinding double-stranded DNA. By analyzing over 14,000 bacterial, archaeal, and fungal ORFeomes, we found that Bacteria and Archaea exhibit an exponential ramp of hydrogen bonding at the 5′-end of CDSs, while a similar ramp was not found in Fungi. The ramp develops within the first 20 codon positions in prokaryotes, eventually reaching a steady carrying capacity of hydrogen bonding that does not differ from Fungi. Selection against uniformity tests proved that selection acts against synonymous codons with high content of hydrogen bonding at the 5′-end of prokaryotic ORFeomes. Overall, this study provides novel insights into the molecular feature of hydrogen bonding that is governed by the genetic code at the 5′-end of CDSs. A web-based application to analyze the position-dependent hydrogen bonding of ORFeomes has been developed and is publicly available (https://juanvillada.shinyapps.io/hbonds/).

Download Full-text

Analysis of computational codon usage models and their association with translationally slow codons

10.1101/2020.03.26.010488 ◽

2020 ◽

Author(s):

Gabriel Wright ◽

Anabel Rodriguez ◽

Jun Li ◽

Patricia L. Clark ◽

Tijana Milenković ◽

...

Keyword(s):

Codon Usage ◽

Computational Models ◽

Selective Pressure ◽

Synonymous Codon ◽

Ground Truth ◽

Protein Translation ◽

Weak Correlation ◽

Experimental Conditions ◽

Synonymous Codons ◽

Genome Wide

AbstractImproved computational modeling of protein translation rates, including better prediction of where translational slowdowns along an mRNA sequence may occur, is critical for understanding co-translational folding. Because codons within a synonymous codon group are translated at different rates, many computational translation models rely on analyzing synonymous codons. Some models rely on genome-wide codon usage bias (CUB), believing that globally rare and common codons are the most informative of slow and fast translation, respectively. Others use the CUB observed only in highly expressed genes, which should be under selective pressure to be translated efficiently (and whose CUB may therefore be more indicative of translation rates). No prior work has analyzed these models for their ability to predict translational slowdowns. Here, we evaluate five models for their association with slowly translated positions as denoted by two independent ribosome footprint (RFP) count experiments from S. cerevisiae, because RFP data is often considered as a “ground truth” for translation rates across mRNA sequences. We show that all five considered models strongly associate with the RFP data and therefore have potential for estimating translational slowdowns. However, we also show that there is a weak correlation between RFP counts for the same genes originating from independent experiments, even when their experimental conditions are similar. This raises concerns about the efficacy of using current RFP experimental data for estimating translation rates and highlights a potential advantage of using computational models to understand translation rates instead.

Download Full-text

Comparative analysis of codon usage bias in Crenarchaea and Euryarchaea genome reveals differential preference of synonymous codons to encode highly expressed ribosomal and RNA polymerase proteins

Journal of Genetics ◽

10.1007/s12041-016-0667-5 ◽

2016 ◽

Vol 95 (3) ◽

pp. 537-549 ◽

Cited By ~ 2

Author(s):

VISHWA JYOTI BARUAH ◽

SIDDHARTHA SANKAR SATAPATHY ◽

BHESH RAJ POWDEL ◽

ROCKTOTPAL KONWARH ◽

ALAK KUMAR BURAGOHAIN ◽

...

Keyword(s):

Comparative Analysis ◽

Rna Polymerase ◽

Codon Usage ◽

Codon Usage Bias ◽

Synonymous Codons

Download Full-text

Codon usage and protein sequence pattern dependency in different organisms: A Bioinformatics approach

Journal of Bioinformatics and Computational Biology ◽

10.1142/s021972001550002x ◽

2015 ◽

Vol 13 (02) ◽

pp. 1550002

Author(s):

Mohammad-Hadi Foroughmand-Araabi ◽

Bahram Goliaei ◽

Kasra Alishahi ◽

Mehdi Sadeghi ◽

Sama Goliaei

Keyword(s):

Gene Regulation ◽

Codon Usage ◽

Protein Sequence ◽

Multinomial Logistic Regression ◽

Translation Efficiency ◽

Translation Rate ◽

Protein Levels ◽

Synonymous Codons ◽

First Time

Although it is known that synonymous codons are not chosen randomly, the role of the codon usage in gene regulation is not clearly understood, yet. Researchers have investigated the relation between the codon usage and various properties, such as gene regulation, translation rate, translation efficiency, mRNA stability, splicing, and protein domains. Recently, a universal codon usage based mechanism for gene regulation is proposed. We studied the role of protein sequence patterns on the codons usage by related genes. Considering a subsequence of a protein that matches to a pattern or motif, we showed that, parts of the genes, which are translated to this subsequence, use specific ratios of synonymous codons. Also, we built a multinomial logistic regression statistical model for codon usage, which considers the effect of patterns on codon usage. This model justifies the observed codon usage preference better than the classic organism dependent codon usage. Our results showed that the codon usage plays a role in controlling protein levels, for genes that participate in a specific biological function. This is the first time that this phenomenon is reported.

Download Full-text

Nucleotide composition bias and codon usage trends of gene populations in Mycoplasma capricolum subsp. capricolum and M. agalactiae

Journal of Genetics ◽

10.1007/s12041-015-0512-2 ◽

2015 ◽

Vol 94 (2) ◽

pp. 251-260 ◽

Cited By ~ 6

Author(s):

XIAO-XIA MA ◽

YU-PING FENG ◽

JIA-LING BAI ◽

DE-RONG ZHANG ◽

XIN-SHI LIN ◽

...

Keyword(s):

Codon Usage ◽

Nucleotide Composition ◽

Mycoplasma Capricolum

Download Full-text