scholarly journals AnnoTree: visualization and exploration of a functionally annotated microbial tree of life

2018 ◽  
Author(s):  
Kerrin Mendler ◽  
Han Chen ◽  
Donovan H. Parks ◽  
Laura A. Hug ◽  
Andrew C. Doxey

AbstractBacterial genomics has revolutionized our understanding of the microbial tree of life; however, mapping and visualizing the distribution of functional traits across bacteria remains a challenge. Here, we introduce AnnoTree - an interactive, functionally annotated bacterial tree of life that integrates taxonomic, phylogenetic, and functional annotation data from nearly 24,000 bacterial genomes. AnnoTree enables visualization of millions of precomputed genome annotations across the bacterial phylogeny, thereby allowing users to explore gene distributions as well as patterns of gene gain and loss across bacteria. Using AnnoTree, we examined the phylogenomic distributions of 28,311 gene/protein families, and measured their phylogenetic conservation, patchiness, and lineage-specificity. Our analyses revealed widespread phylogenetic patchiness among bacterial gene families, reflecting the dynamic evolution of prokaryotic genomes. Genes involved in phage infection/defense, mobile elements, and antibiotic resistance dominated the list of most patchy traits, as well as numerous intriguing metabolic enzymes that appear to have undergone frequent horizontal transfer. We anticipate that AnnoTree will be a valuable resource for exploring gene histories across bacteria, and will act as a catalyst for biological and evolutionary hypothesis generation.

Author(s):  
Gerry Tonkin-Hill ◽  
Neil MacAlasdair ◽  
Christopher Ruis ◽  
Aaron Weimann ◽  
Gal Horesh ◽  
...  

Population-level comparisons of prokaryotic genomes must take into account the substantial differences in gene content, resulting from frequent horizontal gene transfer, gene duplication and gene loss. However, the automated annotation of prokaryotic genomes is imperfect, and errors due to fragmented assemblies, contamination, diverse gene families and mis-assemblies accumulate over the population, leading to profound consequences when analysing the set of all genes found in a species. Here we introduce Panaroo, a graph based pangenome clustering tool that is able to account for many of the sources of error introduced during the annotation of prokaryotic genome assemblies. We verified our approach through extensive simulations of de novo assemblies using the infinitely many genes model and by analysing a number of publicly available large bacterial genome datasets. Using a highly clonal Mycobacterium tuberculosis dataset as a negative control case, we show that failing to account for annotation errors can lead to pangenome estimates that are dominated by error. We additionally demonstrate the utility of the improved graphical output provided by Panaroo by performing a pan-genome wide association study in Neisseria gonorrhoeae and by analysing gene gain and loss rates across 51 of the major global pneumococcal sequence clusters. Panaroo is freely available under an open source MIT licence at https://github.com/gtonkinhill/panaroo.


2020 ◽  
Vol 12 (11) ◽  
pp. 2002-2014
Author(s):  
Ling-Ling Yang ◽  
Zhao Jiang ◽  
Yan Li ◽  
En-Tao Wang ◽  
Xiao-Yang Zhi

Abstract Rhizobia are soil bacteria capable of forming symbiotic nitrogen-fixing nodules associated with leguminous plants. In fast-growing legume-nodulating rhizobia, such as the species in the family Rhizobiaceae, the symbiotic plasmid is the main genetic basis for nitrogen-fixing symbiosis, and is susceptible to horizontal gene transfer. To further understand the symbioses evolution in Rhizobiaceae, we analyzed the pan-genome of this family based on 92 genomes of type/reference strains and reconstructed its phylogeny using a phylogenomics approach. Intriguingly, although the genetic expansion that occurred in chromosomal regions was the main reason for the high proportion of low-frequency flexible gene families in the pan-genome, gene gain events associated with accessory plasmids introduced more genes into the genomes of nitrogen-fixing species. For symbiotic plasmids, although horizontal gene transfer frequently occurred, transfer may be impeded by, such as, the host’s physical isolation and soil conditions, even among phylogenetically close species. During coevolution with leguminous hosts, the plasmid system, including accessory and symbiotic plasmids, may have evolved over a time span, and provided rhizobial species with the ability to adapt to various environmental conditions and helped them achieve nitrogen fixation. These findings provide new insights into the phylogeny of Rhizobiaceae and advance our understanding of the evolution of symbiotic nitrogen fixation.


2021 ◽  
Author(s):  
Kim Vertacnik ◽  
Danielle Herrig ◽  
R Keating Godfrey ◽  
Tom Hill ◽  
Scott Geib ◽  
...  

A central goal in evolutionary biology is to determine the predictability of adaptive genetic changes. Despite many documented cases of convergent evolution at individual loci, little is known about the repeatability of gene family expansions and contractions. To address this void, we examined gene family evolution in the redheaded pine sawfly Neodiprion lecontei, a non-eusocial hymenopteran and exemplar of a pine-specialized lineage evolved from angiosperm-feeding ancestors. After assembling and annotating a draft genome, we manually annotated multiple gene families with chemosensory, detoxification, or immunity functions and characterized their genomic distributions and evolutionary history. Our results suggest that expansions of bitter gustatory receptor (GR), clan 3 cytochrome P450 (CYP3), and antimicrobial peptide (AMP) subfamilies may have contributed to pine adaptation. By contrast, there was no evidence of recent gene family contraction via pseudogenization. Next, we compared the number of genes in these same families across insect taxa that vary in diet, dietary specialization, and social behavior. In Hymenoptera, herbivory was associated with large GR and small olfactory receptor (OR) families, eusociality was associated with large OR and small AMP families, and--unlike investigations among more closely related taxa--ecological specialization was not related to gene family size. Overall, our results suggest that gene families that mediate ecological interactions may expand and contract predictably in response to particular selection pressures, however, the ecological drivers and temporal pace of gene gain and loss likely varies considerably across gene families.


Author(s):  
Fábio K Mendes ◽  
Dan Vanderpool ◽  
Ben Fulton ◽  
Matthew W Hahn

Abstract Motivation Genome sequencing projects have revealed frequent gains and losses of genes between species. Previous versions of our software, Computational Analysis of gene Family Evolution (CAFE), have allowed researchers to estimate parameters of gene gain and loss across a phylogenetic tree. However, the underlying model assumed that all gene families had the same rate of evolution, despite evidence suggesting a large amount of variation in rates among families. Results Here, we present CAFE 5, a completely re-written software package with numerous performance and user-interface enhancements over previous versions. These include improved support for multithreading, the explicit modeling of rate variation among families using gamma-distributed rate categories, and command-line arguments that preclude the use of accessory scripts. Availability and implementation CAFE 5 source code, documentation, test data and a detailed manual with examples are freely available at https://github.com/hahnlab/CAFE5/releases. Contact [email protected] Supplementary information Supplementary data are available at Bioinformatics online.


Antioxidants ◽  
2020 ◽  
Vol 9 (3) ◽  
pp. 257
Author(s):  
Qiang Li ◽  
Hélène San Clemente ◽  
Yongrui He ◽  
Yongyao Fu ◽  
Christophe Dunand

Eucalyptus is a worldwide hard-wood species which increasingly focused on. To adapt to various biotic and abiotic stresses, Eucalyptus have evolved complex mechanisms, increasing the cellular concentration of reactive oxygen species (ROS) by numerous ROS controlling enzymes. To better analyse the ROS gene network and discuss the differences between four Eucalyptus species, ROS gene network including 11 proteins families (1CysPrx, 2CysPrx, APx, APx-R, CIII Prx, Diox, GPx, Kat, PrxII, PrxQ and Rboh) were annotated and compared in an expert and exhaustive manner from the genomic data available from E. camaldulensis, E. globulus, E. grandis, and E. gunnii. In addition, a specific sequencing strategy was performed in order to determine if the missed sequences in at least one organism are the results of gain/loss events or only sequencing gaps. We observed that the automatic annotation applied to multigenic families is the source of miss-annotation. Base on the family size, the 11 families can be categorized into duplicated gene families (CIII Prx, Kat, 1CysPrx, and GPx), which contain a lot of gene duplication events and non-duplicated families (APx, APx-R, Rboh, DiOx, 2CysPrx, PrxII, and PrxQ). The gene family sizes are much larger in Eucalyptus than most of other angiosperms due to recent gene duplications, which could give higher adaptability to environmental changes and stresses. The cross-species comparative analysis shows gene gain and loss events during the evolutionary process. The 11 families possess different expression patterns, while in the Eucalyptus genus, the ROS families present similar expression patterns. Overall, the comparative analysis might be a good criterion to evaluate the adaptation of different species with different characters, but only if data mining is as exhaustive as possible. It is also a good indicator to explore the evolutionary process.


2014 ◽  
Vol 6 (6) ◽  
pp. 1454-1465 ◽  
Author(s):  
Pablo Librado ◽  
Filipe G. Vieira ◽  
Alejandro Sánchez-Gracia ◽  
Sergios-Orestis Kolokotronis ◽  
Julio Rozas

2021 ◽  
Author(s):  
Martin Hunt ◽  
Brice Letcher ◽  
Kerri M Malone ◽  
Giang Nguyen ◽  
Michael B Hall ◽  
...  

Short-read variant calling for bacterial genomics is a mature field, and there are many widely-used software tools. Different underlying approaches (eg pileup, local or global assembly, paired-read use, haplotype use) lend each tool different strengths, especially when considering non-SNP (single nucleotide polymorphism) variation or potentially distant reference genomes. It would therefore be valuable to be able to integrate the results from multiple variant callers, using a robust statistical approach to "adjudicate" at loci where there is disagreement between callers. To this end, we present a tool, Minos, for variant adjudication by mapping reads to a genome graph of variant calls. Minos allows users to combine output from multiple variant callers without loss of precision. Minos also addresses a second problem of joint genotyping SNPs and indels in bacterial cohorts, which can also be framed as an adjudication problem. We benchmark on 62 samples from 3 species (Mycobacterium tuberculosis, Staphylococcus aureus, Klebsiella pneumoniae) and an outbreak of 385 M. tuberculosis samples. Finally, we joint genotype a large M. tuberculosis cohort (N≈15k) for which the rifampicin phenotype is known. We build a map of non-synonymous variants in the RRDR (rifampicin resistance determining region) of the rpoB gene and extend current knowledge relating RRDR SNPs to heterogeneity in rifampicin resistance levels. We replicate this finding in a second M. tuberculosis cohort (N≈13k). Minos is released under the MIT license, available at https://github.com/iqbal-lab-org/minos.


2012 ◽  
Vol 2012 ◽  
pp. 1-14 ◽  
Author(s):  
M. Pilar Francino

Much of the observed variation among closely related bacterial genomes is attributable to gains and losses of genes that are acquired horizontally as well as to gene duplications and larger amplifications. The genomic flexibility that results from these mechanisms certainly contributes to the ability of bacteria to survive and adapt in varying environmental challenges. However, the duplicability and transferability of individual genes imply that natural selection should operate, not only at the organismal level, but also at the level of the gene. Genes can be considered semiautonomous entities that possess specific functional niches and evolutionary dynamics. The evolution of bacterial genes should respond both to selective pressures that favor competition, mostly among orthologs or paralogs that may occupy the same functional niches, and cooperation, with the majority of other genes coexisting in a given genome. The relative importance of either type of selection is likely to vary among different types of genes, based on the functional niches they cover and on the tightness of their association with specific organismal lineages. The frequent availability of new functional niches caused by environmental changes and biotic evolution should enable the constant diversification of gene families and the survival of new lineages of genes.


2017 ◽  
Author(s):  
Michael P. Dunne ◽  
Steven Kelly

AbstractBackroundComplete and accurate annotation of sequenced genomes is of paramount importance to their utility and analysis. Differences in gene prediction pipelines mean that genome sequences for a species can differ considerably in the quality and quantity of their predicted genes. Furthermore, genes that are present in genome sequences sometimes fail to be detected by computational gene prediction methods. Erroneously unannotated genes can lead to oversights and inaccurate assertions in biological investigations, especially for smaller-scale genome projects which rely heavily on computational prediction.ResultsHere we present OrthoFiller, a tool designed to address the problem of finding and adding such missing genes to genome annotations. OrthoFiller leverages information from multiple related species to identify those genes whose existence can be verified through comparison with known gene families, but which have not been predicted. By simulating missing gene annotations in real sequence datasets from both plants and fungi we demonstrate the accuracy and utility of OrthoFiller for finding missing genes and improving genome annotation. Furthermore, we show that applying OrthoFiller to existing “complete” genome annotations can identify and correct substantial numbers of erroneously missing genes in these two sets of species.ConclusionsWe show that significant improvements in the completeness of genome annotations can be made by leveraging information from multiple species.


2021 ◽  
Author(s):  
Mario A Ceron Romero ◽  
Miguel M Fonseca ◽  
Leonardo de Oliveira Martins ◽  
David Posada ◽  
Laura A Katz

Advances in phylogenetics and high throughput sequencing have allowed the reconstruction of deep phylogenetic relationships in the evolution of eukaryotes. Yet, the root of the eukaryotic tree of life remains elusive. The most popular hypothesis in textbooks and reviews is a root between Unikonta (Opisthokonta + Amoebozoa) and Bikonta (all other eukaryotes), which emerged from analyses of a single gene fusion. Subsequent highly cited studies based on concatenation of genes supported this hypothesis with some variations or proposed a root within Excavata. However, concatenation of genes neither considers phylogenetically informative events (i.e. gene duplications and losses), nor provides an estimate of the root. A more recent study using gene tree / species tree reconciliation methods suggested the root lies between Opisthokonta and all other eukaryotes, but only including 59 taxa and 20 genes. Here we apply a gene tree / species tree reconciliation approach to a gene-rich and taxon rich dataset (i.e. 2,786 gene families from two sets of 158 diverse eukaryotic lineages) to assess the root, and we iterate each analysis 100 times to quantify tree space uncertainty. We estimate a root between Fungi and all other eukaryotes, or between Opisthokonta and all other eukaryotes, and reject alternative roots from the literature. Based on further analysis of genome size we propose Opisthokonta + others as the most likely root.


Sign in / Sign up

Export Citation Format

Share Document