scholarly journals CAMITAX: Taxon labels for microbial genomes

2019 ◽  
Author(s):  
Andreas Bremges ◽  
Adrian Fritz ◽  
Alice C. McHardy

The number of microbial genome sequences is growing exponentially, also thanks to recent advances in recovering complete or near-complete genomes from metagenomes and single cells. Assigning reliable taxon labels to genomes is key and often a prerequisite for downstream analyses. We introduce CAMITAX, a scalable and reproducible workflow for the taxonomic labelling of microbial genomes recovered from isolates, single cells, and metagenomes. CAMI-TAX combines genome distance-, 16S rRNA gene-, and gene homology-based taxonomic assignments with phylogenetic placement. It uses Nextflow to orchestrate reference databases and software containers, and thus combines ease of installation and use with computational re-producibility. We evaluated the method on several hundred metagenome-assembled genomes with high-quality taxonomic annotations from the TARA Oceans project, and show that the ensemble classification method in CAMITAX improved on all individual methods across tested ranks. While we initially developed CAMITAX to aid the Critical Assessment of Metagenome Interpretation (CAMI) initiative, it evolved into a comprehensive software to reliably assign taxon labels to microbial genomes. CAMITAX is available under the Apache License 2.0 at: https://github.com/CAMI-challenge/CAMITAX

GigaScience ◽  
2020 ◽  
Vol 9 (1) ◽  
Author(s):  
Andreas Bremges ◽  
Adrian Fritz ◽  
Alice C McHardy

Abstract Background The number of microbial genome sequences is increasing exponentially, especially thanks to recent advances in recovering complete or near-complete genomes from metagenomes and single cells. Assigning reliable taxon labels to genomes is key and often a prerequisite for downstream analyses. Findings We introduce CAMITAX, a scalable and reproducible workflow for the taxonomic labelling of microbial genomes recovered from isolates, single cells, and metagenomes. CAMITAX combines genome distance–, 16S ribosomal RNA gene–, and gene homology–based taxonomic assignments with phylogenetic placement. It uses Nextflow to orchestrate reference databases and software containers and thus combines ease of installation and use with computational reproducibility. We evaluated the method on several hundred metagenome-assembled genomes with high-quality taxonomic annotations from the TARA Oceans project, and we show that the ensemble classification method in CAMITAX improved on all individual methods across tested ranks. Conclusions While we initially developed CAMITAX to aid the Critical Assessment of Metagenome Interpretation (CAMI) initiative, it evolved into a comprehensive software package to reliably assign taxon labels to microbial genomes. CAMITAX is available under Apache License 2.0 at https://github.com/CAMI-challenge/CAMITAX.


2021 ◽  
Vol 9 (6) ◽  
pp. 1128
Author(s):  
Kathleen Cusick ◽  
Gabriel Duran

Saxitoxin (STX) is a secondary metabolite and potent neurotoxin produced by several genera of harmful algal bloom (HAB) marine dinoflagellates. The basis for variability in STX production within natural bloom populations is undefined as both toxic and non-toxic strains (of the same species) have been isolated from the same geographic locations. Pyrodinium bahamense is a STX-producing bioluminescent dinoflagellate that blooms along the east coast of Florida as well as the bioluminescent bays in Puerto Rico (PR), though no toxicity reports exist for PR populations. The core genes in the dinoflagellate STX biosynthetic pathway have been identified, and the sxtA4 gene is essential for toxin production. Using sxtA4 as a molecular proxy for the genetic capacity of STX production, we examined sxtA4+ and sxtA4- genotype frequency at the single cell level in P. bahamense populations from different locations in the Indian River Lagoon (IRL), FL, and Mosquito Bay (MB), a bioluminescent bay in PR. Multiplex PCR was performed on individual cells with Pyrodinium-specific primers targeting the 18S rRNA gene and sxtA4. The results reveal that within discrete natural populations of P. bahamense, both sxtA4+ and sxtA4- genotypes occur, and the sxtA4+ genotype dominates. In the IRL, the frequency of the sxtA4+ genotype ranged from ca. 80–100%. In MB, sxtA4+ genotype frequency ranged from ca 40–66%. To assess the extent of sxtA4 variation within individual cells, sxtA4 amplicons from single cells representative of the different sampling sites were cloned and sequenced. Overall, two variants were consistently obtained, one of which is likely a pseudogene based on alignment with cDNA sequences. These are the first data demonstrating the existence of both genotypes in natural P. bahamense sub-populations, as well as sxtA4 presence in P. bahamense from PR. These results provide insights on underlying genetic factors influencing the potential for toxin variability among natural sub-populations of HAB species and highlight the need to study the genetic diversity within HAB sub-populations at a fine level in order to identify the molecular mechanisms driving HAB evolution.


2021 ◽  
Vol 9 (2) ◽  
pp. 275
Author(s):  
Won Joon Jung ◽  
Hyoun Joong Kim ◽  
Sib Sankar Giri ◽  
Sang Guen Kim ◽  
Sang Wha Kim ◽  
...  

A novel Citrobacter species was isolated from the kidney of diseased rainbow trout (Oncorhynchus mykiss) reared on a trout farm. Biochemical characterization and phylogenetic analysis were performed for bacterial identification. Sequencing of the 16S rRNA gene and five housekeeping genes indicated that the strain belongs to the Citrobacter genus. However, multilocus sequence analysis, a comparison of average nucleotide identity, and genome-to-genome distance values revealed that strain SNU WT2 is distinct and forms a separate clade from other Citrobacter species. Additionally, the phenotype characteristics of the strain differed from those of other Citrobacter species. Quinone analysis indicated that the predominant isoprenoid quinone is Q-10. Furthermore, strain virulence was determined by a rainbow trout challenge trial, and the strain showed resistance to diverse antibiotics including β-lactams, quinolone, and aminoglycosides. The complete genome of strain SNU WT2 is 4,840,504 bp with a DNA G + C content of 51.94% and 106,068-bp plasmid. Genome analysis revealed that the strain carries virulence factors on its chromosome and antibiotic resistance genes on its plasmid. This strain represents a novel species in the genus Citrobacter for which the name C. tructae has been proposed, with SNU WT2 (=KCTC 72517 = JCM 33612) as the type strain.


2014 ◽  
Vol 64 (Pt_2) ◽  
pp. 316-324 ◽  
Author(s):  
Jongsik Chun ◽  
Fred A. Rainey

The polyphasic approach used today in the taxonomy and systematics of the Bacteria and Archaea includes the use of phenotypic, chemotaxonomic and genotypic data. The use of 16S rRNA gene sequence data has revolutionized our understanding of the microbial world and led to a rapid increase in the number of descriptions of novel taxa, especially at the species level. It has allowed in many cases for the demarcation of taxa into distinct species, but its limitations in a number of groups have resulted in the continued use of DNA–DNA hybridization. As technology has improved, next-generation sequencing (NGS) has provided a rapid and cost-effective approach to obtaining whole-genome sequences of microbial strains. Although some 12 000 bacterial or archaeal genome sequences are available for comparison, only 1725 of these are of actual type strains, limiting the use of genomic data in comparative taxonomic studies when there are nearly 11 000 type strains. Efforts to obtain complete genome sequences of all type strains are critical to the future of microbial systematics. The incorporation of genomics into the taxonomy and systematics of the Bacteria and Archaea coupled with computational advances will boost the credibility of taxonomy in the genomic era. This special issue of International Journal of Systematic and Evolutionary Microbiology contains both original research and review articles covering the use of genomic sequence data in microbial taxonomy and systematics. It includes contributions on specific taxa as well as outlines of approaches for incorporating genomics into new strain isolation to new taxon description workflows.


Author(s):  
Magdalena Ksiezarek ◽  
Teresa Gonçalves Ribeiro ◽  
Joana Rocha ◽  
Filipa Grosso ◽  
Svetlana Ugarcina Perovic ◽  
...  

Two Gram-stain-positive strains, c9Ua_26_MT and c11Ua_112_MT, were isolated from voided urine samples from two healthy women. Comparative 16S rRNA gene sequences demonstrated that these novel strains were members of the genus Limosilactobacillus . Phylogenetic analysis based on pheS gene sequences and core genomes showed that each strain formed a separated branch and are closest to Limosilactobacillus vaginalis DSM 5837T. The average nucleotide identity (ANI) and Genome-to-Genome Distance Calculator (GGDC) values between c9Ua_26_MT and the closest relative DSM 5837T were 90.7 and 42.9 %, respectively. The ANI and GGDC values between c11Ua_112_MT and the closest relative DSM 5837T were 91.2 and 45.0 %, and those among the strains were 92.9% and 51,0 %, respectively. The major fatty acids were C12 : 0 (40.2 %), C16 : 0 (26.7 %) and C18 : 1 ω9c (17.7 %) for strain c9Ua_26_MT, and C18 : 1 ω9c (38.0 %), C16 : 0 (33.3 %) and C12 : 0 (17.6 %) for strain c11Ua_112_MT. The genomic DNA G+C content of strains c9Ua_26_MT and c11Ua_112_MT was 39.9 and 39.7 mol%, respectively. On the basis of the data presented here, strains c9Ua_26_MT and c11Ua_112_MT represent two novel species of the genus Limosilactobacillus , for which the names Limosilactobacillus urinaemulieris sp. nov. (c9Ua_26_MT=CECT 30144T=LMG 31899T) and Limosilactobacillus portuensis sp. nov. (c11Ua_112_MT=CECT 30145T=LMG 31898T) are proposed.


2011 ◽  
Vol 61 (12) ◽  
pp. 2974-2978 ◽  
Author(s):  
Jinxing Zhu ◽  
Xiaoli Liu ◽  
Xiuzhu Dong

Two mesophilic methanogenic strains, designated TS-2T and GHT, were isolated from sediments of Tuosu lake and Gahai lake, respectively, in the Qaidam basin, Qinghai province, China. Cells of both isolates were rods (about 0.3–0.5×2–5 µm) with blunt rounded ends and Gram-staining-positive. Strain TS-2T was motile with one or two polar flagella and used only H2/CO2 for growth and methanogenesis. Strain GHT was non-motile, used both H2/CO2 and formate and displayed a variable cell arrangement depending on the substrate: long chains when growing in formate (50 mM) or under high pressure H2 and single cells under low pressure H2. Phylogenetic analysis based on 16S rRNA gene sequences placed the two isolates in the genus Methanobacterium. Strain TS-2T was most closely related to Methanobacterium alcaliphilum NBRC 105226T (96 % 16S rRNA gene sequence similarity). Phylogenetic analysis based on the alpha subunit of methyl-coenzyme M reductase also supported the affiliation of the two isolates with the genus Methanobacterium. DNA–DNA relatedness between the isolates and M. alcaliphilum DSM 3387T was 39–53 %. Hence we propose two novel species, Methanobacterium movens sp. nov. (type strain TS-2T = AS 1.5093T = JCM 15415T) and Methanobacterium flexile sp. nov. (type strain GHT = AS 1.5092T = JCM 15416T).


2017 ◽  
Author(s):  
Zhemin Zhou ◽  
Nina Luhmann ◽  
Nabil-Fareed Alikhan ◽  
Christopher Quince ◽  
Mark Achtman

AbstractExploring the genetic diversity of microbes within the environment through metagenomic sequencing first requires classifying these reads into taxonomic groups. Current methods compare these sequencing data with existing biased and limited reference databases. Several recent evaluation studies demonstrate that current methods either lack sufficient sensitivity for species-level assignments or suffer from false positives, overestimating the number of species in the metagenome. Both are especially problematic for the identification of low-abundance microbial species, e. g. detecting pathogens in ancient metagenomic samples. We present a new method, SPARSE, which improves taxonomic assignments of metagenomic reads. SPARSE balances existing biased reference databases by grouping reference genomes into similarity-based hierarchical clusters, implemented as an efficient incremental data structure. SPARSE assigns reads to these clusters using a probabilistic model, which specifically penalizes non-specific mappings of reads from unknown sources and hence reduces false-positive assignments. Our evaluation on simulated datasets from two recent evaluation studies demonstrated the improved precision of SPARSE in comparison to other methods for species-level classification. In a third simulation, our method successfully differentiated multiple co-existing Escherichia coli strains from the same sample. In real archaeological datasets, SPARSE identified ancient pathogens with ≤ 0.02% abundance, consistent with published findings that required additional sequencing data. In these datasets, other methods either missed targeted pathogens or reported non-existent ones. SPARSE and all evaluation scripts are available at https://github.com/zheminzhou/SPARSE.


Author(s):  
Kiran Kirdat ◽  
Bhavesh Tiwarekar ◽  
Vipool Thorat ◽  
Shivaji Sathe ◽  
Yogesh Shouche ◽  
...  

Sugarcane Grassy Shoot (SCGS) disease is known to be related to Rice Yellow Dwarf (RYD) phytoplasmas (16SrXI-B group) which are found predominantly in sugarcane growing areas of the Indian subcontinent and South-East Asia. The 16S rRNA gene sequences of SCGS phytoplasma strains belonging to the 16SrXI-B group share 98.07 % similarity with ‘Ca. Phytoplasma cynodontis’ strain BGWL-C1 followed by 97.65 % similarity with ‘Ca. P. oryzae’ strain RYD-J. Being placed distinctly away from both the phylogenetically related species, the taxonomic identity of SCGS phytoplasma is unclear and confusing. We attempted to resolve the phylogenetic positions of SCGS phytoplasma based on the phylogenetic analysis of 16S rRNA gene (>1500 bp), nine housekeeping genes (>3500 aa), core genome phylogeny (>10 000 aa) and OGRI values. The draft genome sequences of SCGS phytoplasma (strain SCGS) and Bermuda Grass White leaf (BGWL) phytoplasma (strain LW01), closely related to ‘Ca. P. cynodontis’, were obtained. The SCGS genome was comprised of 29 scaffolds corresponding to 505 173 bp while LW01 assembly contained 21 scaffolds corresponding to 483 935 bp with the fold coverages over 330× and completeness over 90 % for both the genomes. The G+C content of SCGS was 19.86 % while that of LW01 was 20.46 %. The orthoANI values for the strain SCGS against strains LW01 was 79.42 %, and dDDH values were 22. Overall analysis reveals that SCGS phytoplasma forms a distant clade in RYD group of phytoplasmas. Based on phylogenetic analyses and OGRI values obtained from the genome sequences, a novel taxon ‘Candidatus Phytoplasma sacchari’ is proposed.


2021 ◽  
Vol 8 (1) ◽  
Author(s):  
Daniel Roush ◽  
Ana Giraldo-Silva ◽  
Ferran Garcia-Pichel

AbstractCyanobacteria are a widespread and important bacterial phylum, responsible for a significant portion of global carbon and nitrogen fixation. Unfortunately, reliable and accurate automated classification of cyanobacterial 16S rRNA gene sequences is muddled by conflicting systematic frameworks, inconsistent taxonomic definitions (including the phylum itself), and database errors. To address this, we introduce Cydrasil 3 (https://www.cydrasil.org), a curated 16S rRNA gene reference package, database, and web application designed to provide a full phylogenetic perspective for cyanobacterial systematics and routine identification. Cydrasil 3 contains over 1300 manually curated sequences longer than 1100 base pairs and can be used for phylogenetic placement or as a reference sequence set for de novo phylogenetic reconstructions. The web application (utilizing PaPaRA and EPA-ng) can place thousands of sequences into the reference tree and has detailed instructions on how to analyze results. While the Cydrasil web application offers no taxonomic assignments, it instead provides phylogenetic placement, as well as a searchable database with curation notes and metadata, and a mechanism for community feedback.


2021 ◽  
Vol 322 ◽  
pp. 01028
Author(s):  
Nao Fukunaga ◽  
Moe Shimizu ◽  
Shinnosuke Teruya ◽  
Nazifa Naziha Razali ◽  
Satoko Nakashima ◽  
...  

DNA barcoding is an effective and powerful tool for taxonomic identification and thus very useful for biodiversity monitoring. This study investigated the usefulness of the mitochondrial 12S-rRNA gene for the DNA barcoding of shelled marine gastropods. To do so, we determined partial 12S-rRNA sequences of 75 vouchered museum specimens from 69 species of shelled gastropods from Japan. The specimens have been identified morphologically, and natural history data catalog. Sequence analyses through BLAST searches, maximum likelihood phylogenetic analysis, and species delimitation analysis suggested that the 12S-rRNA gene is helpful for barcoding shelled marine gastropods. They thus could be helpful to complement barcoding studies using other markers such as COI. The analyses successfully confirmed all samples’ identity at higher taxonomy (subfamily and above), but much less so at the species level. Our result thus also underlines the lingering problem of DNA barcoding: The lack of comprehensive reference databases of sequences. However, since we provided sequences of properly curated, vouchered museum specimens in this study, our result reported here has thus also helped to give taxonomically reliable reference sequences for biodiversity monitoring and identifications of shelled gastropods which include many important fisheries species.


Sign in / Sign up

Export Citation Format

Share Document