Codon Pairs are Phylogenetically Conserved: Codon pairing as a new class of phylogenetic characters

Mapping Intimacies ◽

10.1101/654947 ◽

2019 ◽

Cited By ~ 1

Author(s):

Justin B. Miller ◽

Lauren M. McKinnon ◽

Michael F. Whiting ◽

Perry G. Ridge

Keyword(s):

Window Size ◽

Orthologous Gene ◽

Translational Efficiency ◽

Phylogenetic Conservation ◽

New Class ◽

Gene Annotations ◽

Alignment Free ◽

Domains Of Life ◽

Taxonomic Groups ◽

Codon Pairs

AbstractIdentical codon pairing and co-tRNA codon pairing increase translational efficiency within genes when two codons that encode the same amino acid are located within a ribosomal window. By examining both identical and co-tRNA codon pairing across 23 423 species, we determined that both pairing techniques are phylogenetically informative across all domains of life using either an alignment-free or parsimony framework. We also determined that conserved codon pairing typically has a smaller window size than the length of a ribosome. We also analyzed frequencies of codon pairing for each codon to determine which codons are most likely to pair. The alignment-free method does not require orthologous gene annotations and recovers species relationships that are comparable to other alignment-free techniques. Parsimony generally recovers phylogenies that are more congruent with the established phylogenies than the alignment-free method. However, four of the ten taxonomic groups do not have sufficient ortholog annotations and are therefore recoverable using only the alignment-free methods. Since the recovered phylogenies using only codon pairing largely match established phylogenies and are comparable to other algorithms, we propose that codon pairing biases are phylogenetically conserved and should be considered in conjunction with current techniques in future phylogenomic studies. Furthermore, the phylogenetic conservation of codon pairing indicates that codon pairing plays a greater role in the speciation process than previously acknowledged.AvailabilityAll scripts used to recover and compare phylogenies, including documentation and test files, are freely available on GitHub at https://github.com/ridgelab/codon_pairing.

Download Full-text

The Prodigal Compound: Return of Ribosyl 1,5-Bisphosphate as an Important Player in Metabolism

Microbiology and Molecular Biology Reviews ◽

10.1128/mmbr.00040-18 ◽

2018 ◽

Vol 83 (1) ◽

Cited By ~ 1

Author(s):

Bjarne Hove-Jensen ◽

Ditlev E. Brodersen ◽

M. Cemre Manav

Keyword(s):

Three Dimensional ◽

Bioinformatic Analysis ◽

Pentose Phosphate ◽

Dimensional Structure ◽

Big Brother ◽

Important Player ◽

Domains Of Life ◽

Important Intermediate ◽

Metabolic Activities ◽

Taxonomic Groups

SUMMARYRibosyl 1,5-bisphosphate (PRibP) was discovered 65 years ago and was believed to be an important intermediate in ribonucleotide metabolism, a role immediately taken over by its “big brother” phosphoribosyldiphosphate. Only recently has PRibP come back into focus as an important player in the metabolism of ribonucleotides with the discovery of the pentose bisphosphate pathway that comprises, among others, the intermediates PRibP and ribulose 1,5-bisphosphate (cf. ribose 5-phosphate and ribulose 5-phosphate of the pentose phosphate pathway). Enzymes of several pathways produce and utilize PRibP not only in ribonucleotide metabolism but also in the catabolism of phosphonates, i.e., compounds containing a carbon-phosphorus bond. Pathways for PRibP metabolism are found in all three domains of life, most prominently among organisms of the archaeal domain, where they have been identified either experimentally or by bioinformatic analysis within all of the four main taxonomic groups,Euryarchaeota, TACK, DPANN, and Asgard. Advances in molecular genetics of archaea have greatly improved the understanding of the physiology of PRibP metabolism, and reconciliation of molecular enzymology and three-dimensional structure analysis of enzymes producing or utilizing PRibP emphasize the versatility of the compound. Finally, PRibP is also an effector of several metabolic activities in many organisms, including higher organisms such as mammals. In the present review, we describe all aspects of PRibP metabolism, with emphasis on the biochemical, genetic, and physiological aspects of the enzymes that produce or utilize PRibP. The inclusion of high-resolution structures of relevant enzymes that bind PRibP provides evidence for the flexibility and importance of the compound in metabolism.

Download Full-text

CAM: an alignment-free method to recover phylogenies using codon aversion motifs

PeerJ ◽

10.7717/peerj.6984 ◽

2019 ◽

Vol 7 ◽

pp. e6984 ◽

Cited By ~ 3

Author(s):

Justin B. Miller ◽

Lauren M. McKinnon ◽

Michael F. Whiting ◽

Perry G. Ridge

Keyword(s):

Maximum Likelihood ◽

Synonymous Codon ◽

Single Gene ◽

Orthologous Gene ◽

Synonymous Codon Usage ◽

High Rate ◽

Pairwise Distance ◽

Character State ◽

Neighbor Joining ◽

Alignment Free

Background Common phylogenomic approaches for recovering phylogenies are often time-consuming and require annotations for orthologous gene relationships that are not always available. In contrast, alignment-free phylogenomic approaches typically use structure and oligomer frequencies to calculate pairwise distances between species. We have developed an approach to quickly calculate distances between species based on codon aversion. Methods Utilizing a novel alignment-free character state, we present CAM, an alignment-free approach to recover phylogenies by comparing differences in codon aversion motifs (i.e., the set of unused codons within each gene) across all genes within a species. Synonymous codon usage is non-random and differs between organisms, between genes, and even within a single gene, and many genes do not use all possible codons. We report a comprehensive analysis of codon aversion within 229,742,339 genes from 23,428 species across all kingdoms of life, and we provide an alignment-free framework for its use in a phylogenetic construct. For each species, we first construct a set of codon aversion motifs spanning all genes within that species. We define the pairwise distance between two species, A and B, as one minus the number of shared codon aversion motifs divided by the total codon aversion motifs of the species, A or B, containing the fewest motifs. This approach allows us to calculate pairwise distances even when substantial differences in the number of genes or a high rate of divergence between species exists. Finally, we use neighbor-joining to recover phylogenies. Results Using the Open Tree of Life and NCBI Taxonomy Database as expected phylogenies, our approach compares well, recovering phylogenies that largely match expected trees and are comparable to trees recovered using maximum likelihood and other alignment-free approaches. Our technique is much faster than maximum likelihood and similar in accuracy to other alignment-free approaches. Therefore, we propose that codon aversion be considered a phylogenetically conserved character that may be used in future phylogenomic studies. Availability CAM, documentation, and test files are freely available on GitHub at https://github.com/ridgelab/cam.

Download Full-text

Differential bicodon usage in lowly and highly abundant proteins

PeerJ ◽

10.7717/peerj.3081 ◽

2017 ◽

Vol 5 ◽

pp. e3081 ◽

Cited By ~ 12

Author(s):

Luis A. Diambra

Keyword(s):

Codon Usage ◽

Drug Transport ◽

Heterologous Gene Expression ◽

Elongation Rate ◽

Translation Elongation ◽

Synonymous Codons ◽

Domains Of Life ◽

Small Change ◽

Protein Elongation ◽

Codon Pairs

Degeneracy in the genetic code implies that different codons can encode the same amino acid. Usage preference of synonymous codons has been observed in all domains of life. There is much evidence suggesting that this bias has a major role on protein elongation rate, contributing to differential expression and to co-translational folding. In addition to codon usage bias, other preference variations have been observed such as codon pairs. In this paper, I report that codon pairs have significant different frequency usage for coding either lowly or highly abundant proteins. These usage preferences cannot be explained by the frequency usage of the single codons. The statistical analysis of coding sequences of nine organisms reveals that in many cases bicodon preferences are shared between related organisms. Furthermore, it is observed that misfolding in the drug-transport protein, encoded by MDR1 gene, is better explained by a big change in the pause propensity due to the synonymous bicodon variant, rather than by a relatively small change in codon usage. These findings suggest that codon pair usage can be a more powerful framework to understand translation elongation rate, protein folding efficiency, and to improve protocols to optimize heterologous gene expression.

Download Full-text

Preferred and avoided codon pairs in three domains of life

BMC Genomics ◽

10.1186/1471-2164-9-463 ◽

2008 ◽

Vol 9 (1) ◽

pp. 463 ◽

Cited By ~ 63

Author(s):

Age Tats ◽

Tanel Tenson ◽

Maido Remm

Keyword(s):

Domains Of Life ◽

Codon Pairs

Download Full-text

PANTHER version 16: a revised family classification, tree-based classification tool, enhancer regions and extensive API

Nucleic Acids Research ◽

10.1093/nar/gkaa1106 ◽

2020 ◽

Vol 49 (D1) ◽

pp. D394-D403 ◽

Cited By ~ 1

Author(s):

Huaiyu Mi ◽

Dustin Ebert ◽

Anushya Muruganujan ◽

Caitlin Mills ◽

Laurent-Philippe Albou ◽

...

Keyword(s):

Phylogenetic Trees ◽

Classification Tree ◽

Gene List ◽

Protein Coding ◽

Protein Class ◽

Domains Of Life ◽

Taxonomic Groups ◽

High Level ◽

Classification Tool

Abstract PANTHER (Protein Analysis Through Evolutionary Relationships, http://www.pantherdb.org) is a resource for the evolutionary and functional classification of protein-coding genes from all domains of life. The evolutionary classification is based on a library of over 15,000 phylogenetic trees, and the functional classifications include Gene Ontology terms and pathways. Here, we analyze the current coverage of genes from genomes in different taxonomic groups, so that users can better understand what to expect when analyzing a gene list using PANTHER tools. We also describe extensive improvements to PANTHER made in the past two years. The PANTHER Protein Class ontology has been completely refactored, and 6101 PANTHER families have been manually assigned to a Protein Class, providing a high level classification of protein families and their genes. Users can access the TreeGrafter tool to add their own protein sequences to the reference phylogenetic trees in PANTHER, to infer evolutionary context as well as fine-grained annotations. We have added human enhancer-gene links that associate non-coding regions with the annotated human genes in PANTHER. We have also expanded the available services for programmatic access to PANTHER tools and data via application programming interfaces (APIs). Other improvements include additional plant genomes and an updated PANTHER GO-slim.

Download Full-text

The Role of Fucose-Containing Glycan Motifs Across Taxonomic Kingdoms

Frontiers in Molecular Biosciences ◽

10.3389/fmolb.2021.755577 ◽

2021 ◽

Vol 8 ◽

Author(s):

Luc Thomès ◽

Daniel Bojar

Keyword(s):

Molecular Mimicry ◽

Large Species ◽

Pathogenic Potential ◽

Domains Of Life ◽

Small Set ◽

Recognition Systems ◽

Species Specific ◽

Taxonomic Groups ◽

Shed Light

The extraordinary diversity of glycans leads to large differences in the glycomes of different kingdoms of life. Yet, while most monosaccharides are solely found in certain taxonomic groups, there is a small set of monosaccharides with widespread distribution across nearly all domains of life. These general monosaccharides are particularly relevant for glycan motifs, as they can readily be used by commensals and pathogens to mimic host glycans or hijack existing glycan recognition systems. Among these, the monosaccharide fucose is especially interesting, as it frequently presents itself as a terminal monosaccharide, primed for interaction with proteins. Here, we analyze fucose-containing glycan motifs across all taxonomic kingdoms. Using a hereby presented large species-specific glycan dataset and a plethora of methods for glycan-focused bioinformatics and machine learning, we identify characteristic as well as shared fucose-containing glycan motifs for various taxonomic groups, demonstrating clear differences in fucose usage. Even within domains, fucose is used differentially based on an organism’s physiology and habitat. We particularly highlight differences in fucose-containing motifs between vertebrates and invertebrates. With the example of pathogenic and non-pathogenic Escherichia coli strains, we also demonstrate the importance of fucose-containing motifs in molecular mimicry and thereby pathogenic potential. We envision that this study will shed light on an important class of glycan motifs, with potential new insights into the role of fucosylated glycans in symbiosis, pathogenicity, and immunity.

Download Full-text

Phylogenetic conservation of soil bacterial responses to simulated global changes

Philosophical Transactions of the Royal Society B Biological Sciences ◽

10.1098/rstb.2019.0242 ◽

2020 ◽

Vol 375 (1798) ◽

pp. 20190242 ◽

Cited By ~ 4

Author(s):

Kazuo Isobe ◽

Nicholas J. Bouskill ◽

Eoin L. Brodie ◽

Erika A. Sudderth ◽

Jennifer B. H. Martiny

Keyword(s):

Bacterial Communities ◽

Global Changes ◽

Phylogenetic Information ◽

Phylogenetic Conservation ◽

Soil Bacterial Communities ◽

Microbial Community Ecology ◽

Soil Bacterial ◽

Predictive Understanding ◽

Taxonomic Groups ◽

P Addition

Soil bacterial communities are altered by anthropogenic drivers such as climate change-related warming and fertilization. However, we lack a predictive understanding of how bacterial communities respond to such global changes. Here, we tested whether phylogenetic information might be more predictive of the response of bacterial taxa to some forms of global change than others. We analysed the composition of soil bacterial communities from perturbation experiments that simulated warming, drought, elevated CO 2 concentration and phosphorus (P) addition. Bacterial responses were phylogenetically conserved to all perturbations. The phylogenetic depth of these responses varied minimally among the types of perturbations and was similar when merging data across locations, implying that the context of particular locations did not affect the phylogenetic pattern of response. We further identified taxonomic groups that responded consistently to each type of perturbation. These patterns revealed that, at the level of family and above, most groups responded consistently to only one or two types of perturbations, suggesting that traits with different patterns of phylogenetic conservation underlie the responses to different perturbations. We conclude that a phylogenetic approach may be useful in predicting how soil bacterial communities respond to a variety of global changes. This article is part of the theme issue ‘Conceptual challenges in microbial community ecology’.

Download Full-text

The role of fucose-containing glycan motifs across taxonomic kingdoms

10.1101/2021.08.08.455599 ◽

2021 ◽

Author(s):

Luc Thomès ◽

Daniel Bojar

Keyword(s):

Molecular Mimicry ◽

Large Species ◽

Pathogenic Potential ◽

Pathogenic Escherichia Coli ◽

Domains Of Life ◽

Small Set ◽

Recognition Systems ◽

Species Specific ◽

Taxonomic Groups

The extraordinary diversity of glycans leads to large differences in the glycomes of different kingdoms of life. Yet, while most monosaccharides are solely found in certain taxonomic groups, there is a small set of monosaccharides with widespread distribution across nearly all domains of life. These general monosaccharides are particularly relevant for glycan motifs, as they can readily be used by commensals and pathogens to mimic host glycans or hijack existing glycan recognition systems. Among these, the monosaccharide fucose is especially interesting, as it frequently presents itself as a terminal monosaccharide, primed for interaction with proteins. Here, we analyze fucose-containing glycan motifs across all taxonomic kingdoms. Using a hereby presented large species-specific glycan dataset and a plethora of methods for glycan-focused bioinformatics and machine learning, we identify characteristic as well as shared fucose-containing glycan motifs for various taxonomic groups, demonstrating clear differences in fucose usage. Even within domains, fucose is used differentially based on an organism's physiology and habitat. We particularly highlight differences in fucose-containing motifs between vertebrates and invertebrates. With the example of pathogenic and non-pathogenic Escherichia coli strains, we also demonstrate the importance of fucose-containing motifs in molecular mimicry and thereby pathogenic potential. Finally, we also confirm and extend the role of fucosyltransferase-coding genes (FUT) in several important biological processes that include development, immunity, and diseases, especially cancer. We envision that this study will shed light on an important class of glycan motifs, with potential new insights into the role of fucosylated glycans in symbiosis, pathogenicity, and immunity.

Download Full-text

SCNBase: a genomics portal for the soybean cyst nematode (Heterodera glycines)

Database ◽

10.1093/database/baz111 ◽

2019 ◽

Vol 2019 ◽

Cited By ~ 2

Author(s):

Rick Masonbrink ◽

Tom R Maier ◽

Arun S Seetharam ◽

Parijat S Juvale ◽

Levi Baber ◽

...

Keyword(s):

Soybean Cyst Nematode ◽

Heterodera Glycines ◽

Orthologous Gene ◽

Nematode Species ◽

Cyst Nematode ◽

Future Research ◽

Parasitic Nematodes ◽

Management Tools ◽

Genomic Resources ◽

Gene Annotations

AbstractSoybean is an important worldwide crop, and farmers continue to experience significant yield loss due to the soybean cyst nematode (SCN), Heterodera glycines. This soil-borne roundworm parasite is rated the most important pathogen problem in soybean production. The infective nematodes enter into complex interactions with their host plant by inducing the development of specialized plant feeding cells that provide the parasites with nourishment. Addressing the SCN problem will require the development of genomic resources and a global collaboration of scientists to analyze and use these resources. SCNBase.org was designed as a collaborative hub for the SCN genome. All data and analyses are downloadable and can be analyzed with three integrated genomic tools: JBrowse, Feature Search and BLAST. At the time of this writing, a number of genomic and transcriptomic data sets are already available, with 43 JBrowse tracks and 21 category pages describing SCN genomic analyses on gene predictions, transcriptome and read alignments, effector-like genes, expansion and contraction of genomic repeats, orthology and synteny with related nematode species, Single Nucleotide Polymorphism (SNPs) from 15 SCN populations and novel splice sites. Standard functional gene annotations were supplemented with orthologous gene annotations using a comparison to nine related plant-parasitic nematodes, thereby enabling functional annotations for 85% of genes. These annotations led to a greater grasp on the SCN effectorome, which include over 3324 putative effector genes. By designing SCNBase as a hub, future research findings and genomic resources can easily be uploaded and made available for use by others with minimal needs for further curation. By providing these resources to nematode research community, scientists will be empowered to develop novel, more effective SCN management tools.

Download Full-text

CAM: An alignment-free method to recover phylogenies using codon aversion motifs

10.7287/peerj.preprints.27756v1 ◽

2019 ◽

Author(s):

Justin B Miller ◽

Lauren M McKinnon ◽

Michael F Whiting ◽

Perry G Ridge

Keyword(s):

Maximum Likelihood ◽

Synonymous Codon ◽

Single Gene ◽

Orthologous Gene ◽

Synonymous Codon Usage ◽

High Rate ◽

Pairwise Distance ◽

Character State ◽

Neighbor Joining ◽

Alignment Free

Background. Common phylogenomic approaches for recovering phylogenies are often time-consuming and require annotations for orthologous gene relationships that are not always available. In contrast, alignment-free phylogenomic approaches typically use structure and oligomer frequencies to calculate pairwise distances between species. We have developed an algorithm to quickly calculate distances between species based on codon aversion. Methods. Utilizing a novel alignment-free character state, we present CAM, an alignment-free approach to recover phylogenies by comparing differences in codon aversion motifs (i.e., the set of unused codons within each gene) across all genes within a species. Synonymous codon usage is non-random and differs between organisms, between genes, and even within a single gene, where many genes do not use all possible codons. We report a comprehensive analysis of codon aversion within 229 742 339 genes from 23 428 species across all kingdoms of life, and we provide an alignment-free framework for its use in a phylogenetic construct. For each species, we first construct a set of codon aversion motifs spanning all genes within that species. We define the pairwise distance between two species, A and B, as one minus the number of shared codon aversion motifs divided by the total codon aversion motifs of the species, A or B, containing the fewest motifs. This approach allows us to calculate pairwise distances even when substantial differences in the number of genes or a high rate of divergence between species exists. Finally, we use neighbor-joining to recover phylogenies. Results. Using the Open Tree of Life and NCBI Taxonomy Database as expected phylogenies, our approach compares well, recovering phylogenies that largely match expected trees and are comparable to trees recovered using maximum likelihood and other alignment-free approaches. Our technique is much faster than maximum likelihood and similar in accuracy to other alignment-free approaches. Therefore, we propose that codon aversion be considered a phylogenetically conserved character that may be used in future phylogenomic studies. Availability. CAM, documentation, and test files are freely available on GitHub at https://github.com/ridgelab/cam

Download Full-text