Synonymous Dinucleotide Usage: A Codon-Aware Metric for Quantifying Dinucleotide Representation in Viruses

Spyros Lytras; Joseph Hughes

doi:10.3390/v12040462

Synonymous Dinucleotide Usage: A Codon-Aware Metric for Quantifying Dinucleotide Representation in Viruses

Viruses ◽

10.3390/v12040462 ◽

2020 ◽

Vol 12 (4) ◽

pp. 462 ◽

Cited By ~ 1

Author(s):

Spyros Lytras ◽

Joseph Hughes

Keyword(s):

Synonymous Codon ◽

Synonymous Codon Usage ◽

Statistical Interpretation ◽

Coding Sequences ◽

Cpg Dinucleotides ◽

Viral Genomes ◽

Dinucleotide Composition ◽

Codon Positions ◽

Living Organisms ◽

Null Expectation

Distinct patterns of dinucleotide representation, such as CpG and UpA suppression, are characteristic of certain viral genomes. Recent research has uncovered vertebrate immune mechanisms that select against specific dinucleotides in targeted viruses. This evidence highlights the importance of systematically examining the dinucleotide composition of viral genomes. We have developed a novel metric, called synonymous dinucleotide usage (SDU), for quantifying dinucleotide representation in coding sequences. Our method compares the abundance of a given dinucleotide to the null hypothesis of equal synonymous codon usage in the sequence. We present a Python3 package, DinuQ, for calculating SDU and other relevant metrics. We have applied this method on two sets of invertebrate- and vertebrate-specific flaviviruses and rhabdoviruses. The SDU shows that the vertebrate viruses exhibit consistently greater under-representation of CpG dinucleotides in all three codon positions in both datasets. In comparison to existing metrics for dinucleotide quantification, the SDU allows for a statistical interpretation of its values by comparing it to a null expectation based on the codon table. Here we apply the method to viruses, but coding sequences of other living organisms can be analysed in the same way.

Download Full-text

Synonymous Dinucleotide Usage: A Codon-Aware Metric for Quantifying Dinucleotide Representation in Viruses

10.1101/2020.03.02.973438 ◽

2020 ◽

Cited By ~ 1

Author(s):

Spyros Lytras ◽

Joseph Hughes

Keyword(s):

Synonymous Codon ◽

Synonymous Codon Usage ◽

Statistical Interpretation ◽

Coding Sequences ◽

Cpg Dinucleotides ◽

Viral Genomes ◽

Dinucleotide Composition ◽

Codon Positions ◽

Living Organisms ◽

Null Expectation

AbstractDistinct patterns of dinucleotide representation, such as CpG and UpA suppression, are characteristic of certain viral genomes. Recent research has uncovered vertebrate immune mechanisms that select against specific dinucleotides in targeted viruses. This evidence highlights the importance of systematically examining the dinucleotide composition of viral genomes. We have developed a novel metric, called Synonymous Dinucleotide Usage (SDU), for quantifying dinucleotide representation in coding sequences. Our method compares the abundance of a given dinucleotide to the null hypothesis of equal synonymous codon usage in the sequence. We present a Python3 package, DinuQ, for calculating SDU and other relevant metrics. We have applied this method on two sets of invertebrate- and vertebrate-specific flaviviruses and rhabdoviruses. The SDU shows that the vertebrate viruses exhibit consistently greater under-representation of CpG dinucleotides in all three codon positions in both datasets. In comparison to existing metrics for dinucleotide quantification, the SDU allows for a statistical interpretation of its values by comparing it to a null expectation based on the codon table. Here we apply the method to viruses, but coding sequences of other living organisms can be analysed in the same way.

Download Full-text

corseq: fast and efficient identification of favoured codons from next generation sequencing reads

PeerJ ◽

10.7717/peerj.5099 ◽

2018 ◽

Vol 6 ◽

pp. e5099 ◽

Cited By ~ 1

Author(s):

Salvatore Camiolo ◽

Andrea Porceddu

Keyword(s):

Genomic Sequence ◽

Gene Annotation ◽

Synonymous Codon ◽

Synonymous Codon Usage ◽

Model Organisms ◽

Coding Sequences ◽

Rnaseq Data ◽

Highly Expressed Genes ◽

Or Gene ◽

Generation Sequencing

Background Optimization of transgene expression can be achieved by designing coding sequences with the synonymous codon usage of genes which are highly expressed in the host organism. The identification of the so-called “favoured codons” generally requires the access to either the genome or the coding sequences and the availability of expression data. Results Here we describe corseq, a fast and reliable software for detecting the favoured codons directly from RNAseq data without prior knowledge of genomic sequence or gene annotation. The presented tool allows the inference of codons that are preferentially used in highly expressed genes while estimating the transcripts abundance by a new kmer based approach. corseq is implemented in Python and runs under any operating system. The software requires the Biopython 1.65 library (or later versions) and is available under the ‘GNU General Public License version 3’ at the project webpage https://sourceforge.net/projects/corseq/files. Conclusion corseq represents a faster and easy-to-use alternative for the detection of favoured codons in non model organisms.

Download Full-text

Codon Usage Bias in Autophagy-Related Gene 13 in Eukaryotes: Uncovering the Genetic Divergence by the Interplay Between Nucleotides and Codon Usages

Frontiers in Cellular and Infection Microbiology ◽

10.3389/fcimb.2021.771010 ◽

2021 ◽

Vol 11 ◽

Author(s):

Yicong Li ◽

Rui Wang ◽

Huihui Wang ◽

Feiyang Pu ◽

Xili Feng ◽

...

Keyword(s):

Amino Acid ◽

Codon Usage ◽

Codon Usage Bias ◽

Essential Gene ◽

Synonymous Codon ◽

Phylogenetic Analyses ◽

Nucleotide Composition ◽

Synonymous Codon Usage ◽

Related Gene ◽

Codon Positions

Synonymous codon usage bias is a universal characteristic of genomes across various organisms. Autophagy-related gene 13 (atg13) is one essential gene for autophagy initiation, yet the evolutionary trends of the atg13 gene at the usages of nucleotide and synonymous codon remains unexplored. According to phylogenetic analyses for the atg13 gene of 226 eukaryotic organisms at the nucleotide and amino acid levels, it is clear that their nucleotide usages exhibit more genetic information than their amino acid usages. Specifically, the overall nucleotide usage bias quantified by information entropy reflected that the usage biases at the first and second codon positions were stronger than those at the third position of the atg13 genes. Furthermore, the bias level of nucleotide ‘G’ usage is highest, while that of nucleotide ‘C’ usage is lowest in the atg13 genes. On top of that, genetic features represented by synonymous codon usage exhibits a species-specific pattern on the evolution of the atg13 genes to some extent. Interestingly, the codon usages of atg13 genes in the ancestor animals (Latimeria chalumnae, Petromyzon marinus, and Rhinatrema bivittatum) are strongly influenced by mutation pressure from nucleotide composition constraint. However, the distributions of nucleotide composition at different codon positions in the atg13 gene display that natural selection still dominates atg13 codon usages during organisms’ evolution.

Download Full-text

Comparative Analysis of Codon Usage and tRNA in Mitochondrial Genomes of Gallus Gallus

Avian Biology Research ◽

10.3184/175815509x12473915395956 ◽

2009 ◽

Vol 2 (3) ◽

pp. 133-141

Author(s):

Tangjie Zhang ◽

Hong Chang ◽

Yuzhi Liu ◽

Huifang Li ◽

Kuanwei Chen

Keyword(s):

Codon Usage ◽

Synonymous Codon ◽

Gallus Gallus ◽

Mitochondrial Genes ◽

Synonymous Codon Usage ◽

Relative Synonymous Codon Usage ◽

Mutational Bias ◽

Mitochondrial Genomes ◽

The Third ◽

Codon Positions

Codon usage in mitochondrial genes of 11 Gallus gallus and two Anatidae species was analysed to determine the general patterns in codon choice of Callus gallus species. C3 contents were higher in Gallus gallus than in mammalian mitochondrial genomes that encode protein codon positions. The high C3 contents of Callus gallus might be the result of relatively strong mutational bias that occurred in the lineage of the Callus gallus species. A and C ending codons were detected as the “preferred 77 codons in Callus gallus and Anatidae. The NNR codon families are dominated by the A-ending codons, the NNY codon families are dominated by the C-ending codons and the NNN codon families are dominated by the A-ending or the C-ending codons. A comparison of the relative synonymous codon usage (RSCU) and synonymous codon families (SCF) of tRNA and proteins was made, and two groups can be classified by SCF. The codon usage in Callus gallus species indicates that codons containing A or C at the third position are used preferentially, regardless of whether corresponding tRNAs are encoded in the mtDNA. In both Callus gallus and Anatidae species mtDNA, codon usage biases are highly related to CC-ending binucleotide condons.

Download Full-text

Analysis of Synonymous Codon Usage Bias in Potato Virus M and Its Adaption to Hosts

Viruses ◽

10.3390/v11080752 ◽

2019 ◽

Vol 11 (8) ◽

pp. 752 ◽

Cited By ~ 8

Author(s):

Zhen He ◽

Haifeng Gan ◽

Xinyan Liang

Keyword(s):

Natural Selection ◽

Codon Usage ◽

Potato Virus ◽

Synonymous Codon ◽

Synonymous Codon Usage ◽

Economic Losses ◽

Codon Usage Pattern ◽

Usage Pattern ◽

Coding Sequences ◽

Potato Virus M

Potato virus M (PVM) is a member of the genus Carlavirus of the family Betaflexviridae and causes large economic losses of nightshade crops. Several previous studies have elucidated the population structure, evolutionary timescale and adaptive evolution of PVM. However, the synonymous codon usage pattern of PVM remains unclear. In this study, we performed comprehensive analyses of the codon usage and composition of PVM based on 152 nucleotide sequences of the coat protein (CP) gene and 125 sequences of the cysteine-rich nucleic acid binding protein (NABP) gene. We observed that the PVM CP and NABP coding sequences were GC-and AU-rich, respectively, whereas U- and G-ending codons were preferred in the PVM CP and NABP coding sequences. The lower codon usage of the PVM CP and NABP coding sequences indicated a relatively stable and conserved genomic composition. Natural selection and mutation pressure shaped the codon usage patterns of PVM, with natural selection being the most important factor. The codon adaptation index (CAI) and relative codon deoptimization index (RCDI) analysis revealed that the greatest adaption of PVM was to pepino, followed by tomato and potato. Moreover, similarity Index (SiD) analysis showed that pepino had a greater impact on PVM than tomato and potato. Our study is the first attempt to evaluate the codon usage pattern of the PVM CP and NABP genes to better understand the evolutionary changes of a carlavirus.

Download Full-text

Selection at the Amino Acid Level Can Influence Synonymous Codon Usage: Implications for the Study of Codon Adaptation in Plastid Genes

Genetics ◽

10.1093/genetics/159.1.347 ◽

2001 ◽

Vol 159 (1) ◽

pp. 347-358

Author(s):

Brian R Morton

Keyword(s):

Codon Usage ◽

Synonymous Codon ◽

Amino Acid Level ◽

Synonymous Codon Usage ◽

Noncoding Dna ◽

Translation Rate ◽

Coding Sequences ◽

Synonymous Codons ◽

Synonymous Sites ◽

Translation Accuracy

Abstract A previously employed method that uses the composition of noncoding DNA as the basis of a test for selection between synonymous codons in plastid genes is reevaluated. The test requires the assumption that in the absence of selective differences between synonymous codons the composition of silent sites in coding sequences will match the composition of noncoding sites. It is demonstrated here that this assumption is not necessarily true and, more generally, that using compositional properties to draw inferences about selection on silent changes in coding sequences is much more problematic than commonly assumed. This is so because selection on nonsynonymous changes can influence the composition of synonymous sites (i.e., codon usage) in a complex manner, meaning that the composition biases of different silent sites, including neutral noncoding DNA, are not comparable. These findings also draw into question the commonly utilized method of investigating how selection to increase translation accuracy influences codon usage. The work then focuses on implications for studies that assess codon adaptation, which is selection on codon usage to enhance translation rate, in plastid genes. A new test that does not require the use of noncoding DNA is proposed and applied. The results of this test suggest that far fewer plastid genes display codon adaptation than previously thought.

Download Full-text

Molecular Evolution of Alphabaculovirus genomes: Evidence of Mutational bias and Natural selection

10.21203/rs.3.rs-244707/v1 ◽

2021 ◽

Author(s):

Puttatida Mahapattanakul ◽

Pragun Rajbhandari ◽

Patsarin Rodpothong

Keyword(s):

Natural Selection ◽

Codon Usage ◽

Synonymous Codon ◽

Nucleotide Composition ◽

Synonymous Codon Usage ◽

Open Reading Frames ◽

Mutational Bias ◽

Functional Conservation ◽

Codon Positions ◽

Or Genes

Abstract Codon usage is a reflection of evolutionary adaptation to environmental pressure. The pattern of usage may be unique to species of viruses, genomes of the same species or genes within the same genome. Here we have analysed the overall nucleotide composition and the nucleotides at different codon positions in the genomes of 6 Alphabaculoviruses. Principle Component Analysis (PCA) based on Relative Synonymous Codon Usage (RSCU) of all Open Reading Frames (ORFs) was employed to investigate the pattern of the codon usage. The results suggest the Alphabaculovirus genomes, except that of Agrotis Ipsilon mNPV (AgipNPV), are predominantly under an influence of a neutral mutation that bias toward A/T. The majority of the ORFs, except those of the AgipNPV, cluster at the same location in the 2-dimensional PCA map with one prominent outlier that has been identified as a P6.9 gene. The six Alpha-baculovirus P6.9 genes have a high G/C content, dissimilar to the majority of the ORFs. The G/C content is found to be significantly high at the 2 nd codon position, suggesting the influence of natural selection and perhaps reflecting its functional conservation in DNA packaging as well as its evolutionary relation to Protamine.

Download Full-text

Variability in codon usage in Coronaviruses is mainly driven by mutational bias and selective constraints on CpG dinucleotide

10.1101/2021.01.26.428296 ◽

2021 ◽

Author(s):

J. Daron ◽

I.G. Bravo

Keyword(s):

Codon Usage ◽

Codon Usage Bias ◽

Synonymous Codon ◽

Synonymous Codon Usage ◽

Mutational Bias ◽

Cpg Dinucleotides ◽

Neutral Equilibrium ◽

History Of ◽

Cpg Dinucleotide ◽

Human Coronaviruses

AbstractThe Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is the third virus within the Orthocoronavirinae causing an emergent infectious disease in humans, the ongoing coronavirus disease 2019 pandemic (COVID-19). Due to the high zoonotic potential of these viruses, it is critical to unravel their evolutionary history of host species shift, adaptation and emergence. Only such knowledge can guide virus discovery, surveillance and research efforts to identify viruses posing a pandemic risk in humans. We present a comprehensive analysis of the composition and codon usage bias of the 82 Orthocoronavirinae members, infecting 47 different avian and mammalian hosts. Our results clearly establish that synonymous codon usage varies widely among viruses and is only weakly dependent on the type of host they infect. Instead, we identify mutational bias towards AT-enrichment and selection against CpG dinucleotides as the main factors responsible of the codon usage bias variation. Further insight on the mutational equilibrium within Orthocoronavirinae revealed that most coronavirus genomes are close to their neutral equilibrium, the exception is the three recently-infecting human coronaviruses, which lie further away from the mutational equilibrium than their endemic human coronavirus counterparts. Finally, our results suggest that while replicating in humans SARS-CoV-2 is slowly becoming AT-richer, likely until attaining a new mutational equilibrium.

Download Full-text