Linking pangenomes and metagenomes: the Prochlorococcus metapangenome

PeerJ ◽

10.7717/peerj.4320 ◽

2018 ◽

Vol 6 ◽

pp. e4320 ◽

Cited By ~ 95

Author(s):

Tom O. Delmont ◽

A. Murat Eren

Keyword(s):

Sugar Metabolism ◽

Gene Clusters ◽

Genomic Islands ◽

Metagenomic Data ◽

Integrated Analysis ◽

Marker Genes ◽

Relative Distribution ◽

Environmental Distribution ◽

Microbial Genomes ◽

Core Genes

Pangenomes offer detailed characterizations of core and accessory genes found in a set of closely related microbial genomes, generally by clustering genes based on sequence homology. In comparison, metagenomes facilitate highly resolved investigations of the relative distribution of microbial genomes and individual genes across environments through read recruitment analyses. Combining these complementary approaches can yield unique insights into the functional basis of microbial niche partitioning and fitness, however, advanced software solutions are lacking. Here we present an integrated analysis and visualization strategy that provides an interactive and reproducible framework to generate pangenomes and to study them in conjunction with metagenomes. To investigate its utility, we applied this strategy to a Prochlorococcus pangenome in the context of a large-scale marine metagenomic survey. The resulting Prochlorococcus metapangenome revealed remarkable differential abundance patterns between very closely related isolates that belonged to the same phylogenetic cluster and that differed by only a small number of gene clusters in the pangenome. While the relationships between these genomes based on gene clusters correlated with their environmental distribution patterns, phylogenetic analyses using marker genes or concatenated single-copy core genes did not recapitulate these patterns. The metapangenome also revealed a small set of core genes that mostly occurred in hypervariable genomic islands of the Prochlorococcus populations, which systematically lacked read recruitment from surface ocean metagenomes. Notably, these core gene clusters were all linked to sugar metabolism, suggesting potential benefits to Prochlorococcus from a high sequence diversity of sugar metabolism genes. The rapidly growing number of microbial genomes and increasing availability of environmental metagenomes provide new opportunities to investigate the functioning and the ecology of microbial populations, and metapangenomes can provide unique insights for any taxon and biome for which genomic and sufficiently deep metagenomic data are available.

Download Full-text

IMG-ABC: A Knowledge Base To Fuel Discovery of Biosynthetic Gene Clusters and Novel Secondary Metabolites

mBio ◽

10.1128/mbio.00932-15 ◽

2015 ◽

Vol 6 (4) ◽

Cited By ~ 66

Author(s):

Michalis Hadjithomas ◽

I-Min Amy Chen ◽

Ken Chu ◽

Anna Ratner ◽

Krishna Palaniappan ◽

...

Keyword(s):

Secondary Metabolites ◽

Secondary Metabolism ◽

Genomic Data ◽

Gene Clusters ◽

Metagenomic Data ◽

Integrated Analysis ◽

Analysis Tool ◽

Biosynthetic Gene ◽

Biosynthetic Gene Clusters ◽

Analysis Tools

ABSTRACTIn the discovery of secondary metabolites, analysis of sequence data is a promising exploration path that remains largely underutilized due to the lack of computational platforms that enable such a systematic approach on a large scale. In this work, we present IMG-ABC (https://img.jgi.doe.gov/abc), an atlas of biosynthetic gene clusters within the Integrated Microbial Genomes (IMG) system, which is aimed at harnessing the power of “big” genomic data for discovering small molecules. IMG-ABC relies on IMG's comprehensive integrated structural and functional genomic data for the analysis of biosynthetic gene clusters (BCs) and associated secondary metabolites (SMs). SMs and BCs serve as the two main classes of objects in IMG-ABC, each with a rich collection of attributes. A unique feature of IMG-ABC is the incorporation of both experimentally validated and computationally predicted BCs in genomes as well as metagenomes, thus identifying BCs in uncultured populations and rare taxa. We demonstrate the strength of IMG-ABC's focused integrated analysis tools in enabling the exploration of microbial secondary metabolism on a global scale, through the discovery of phenazine-producing clusters for the first time inAlphaproteobacteria. IMG-ABC strives to fill the long-existent void of resources for computational exploration of the secondary metabolism universe; its underlying scalable framework enables traversal of uncovered phylogenetic and chemical structure space, serving as a doorway to a new era in the discovery of novel molecules.IMPORTANCEIMG-ABC is the largest publicly available database of predicted and experimental biosynthetic gene clusters and the secondary metabolites they produce. The system also includes powerful search and analysis tools that are integrated with IMG's extensive genomic/metagenomic data and analysis tool kits. As new research on biosynthetic gene clusters and secondary metabolites is published and more genomes are sequenced, IMG-ABC will continue to expand, with the goal of becoming an essential component of any bioinformatic exploration of the secondary metabolism world.

Download Full-text

Comparative Genomics and Environmental Distribution of Large dsDNA viruses in the family Asfarviridae

10.1101/2021.01.29.428683 ◽

2021 ◽

Author(s):

Sangita Karki ◽

Mohammad Moniruzzaman ◽

Frank O. Aylward

Keyword(s):

Amino Acid ◽

African Swine Fever Virus ◽

Phylogenetic Analyses ◽

African Swine Fever ◽

Amino Acid Identity ◽

Metagenomic Data ◽

Genome Comparison ◽

Marker Genes ◽

Environmental Distribution ◽

Acid Identity

AbstractThe Asfarviridae is a family of Nucleo-Cytoplasmic Large DNA Viruses (NCLDV) of which African swine fever virus (ASFV) is the most well-characterized. Recently the discovery of several Asfarviridae members other than ASFV has suggested that this family represents a diverse and cosmopolitan group of viruses, but the genomics and distribution of this family have not been studied in detail. To this end we analyzed five complete genomes and 35 metagenome-assembled genomes (MAGs) of viruses from this family to shed light on their evolutionary relationships and environmental distribution. The Asfarvirus MAGs derive from diverse marine, freshwater, and terrestrial habitats, underscoring the broad environmental distribution of this family. We present phylogenetic analyses using conserved marker genes and whole-genome comparison of pairwise average amino acid identity values, revealing a high level of genomic divergence across disparate Asfarviruses. Further, we found that Asfarviridae genomes encode genes with diverse predicted metabolic roles and detectable sequence homology to proteins in bacteria, archaea, and different eukaryotes, highlighting the genomic chimerism that is a salient feature of NCLDV. Our read mapping from Tara oceans metagenomic data also revealed that three Asfarviridae MAGs were present in multiple marine samples, indicating that they are widespread in the ocean. In one of these MAGs we identified four marker genes with >95% amino acid identity to genes sequenced from a virus that infects the dinoflagellate Heterocapsa circularisquama (HcDNAV). This suggests a potential host for this MAG, which would thereby represent a near-complete genome of a dinoflagellate-infecting giant virus. Together, these results show that Asfarviridae are ubiquitous, comprise similar sequence divergence as other NCLDV families, and include several members that are widespread in the ocean and potentially infect ecologically important protists.

Download Full-text

Out of the abyss: Genome and metagenome mining reveals unexpected environmental distribution of abyssomicins

10.1101/789859 ◽

2019 ◽

Cited By ~ 1

Author(s):

Alba Iglesias ◽

Adriel Latorre-Pérez ◽

James E. M. Stach ◽

Manuel Porcar ◽

Javier Pascual

Keyword(s):

Genome Mining ◽

Gene Clusters ◽

Metagenomic Data ◽

Model Organisms ◽

Bacterial Genomes ◽

Biosynthetic Gene Clusters ◽

Environmental Distribution ◽

Regulatory Systems ◽

Terrestrial Habitats ◽

Metagenome Mining

AbstractNatural products have traditionally been discovered through the screening of culturable microbial isolates from all sort of environments. The sequencing revolution allowed the identification of dozens of biosynthetic gene clusters (BGCs) within single bacterial genomes, either from cultured or uncultured strains. However, we are still far from fully exploiting the microbial reservoir, as most of the species are non-model organisms with complex regulatory systems and yet recalcitrant to be engineered. Today, genomic and metagenomic data produced by laboratories worldwide covering the most different natural and artificial environments on Earth, are an invaluable source of raw information from which natural product biosynthesis can be accessed. In the present work, we describe the environmental distribution and evolution of the abyssomicin BGC through the analysis of publicly available genomic and metagenomic data. Our results demonstrate that the selection of a pathway-specific enzyme to direct the genome mining is an excellent strategy that led to the identification of 74 new Diels-Alderase homologs and unveiled a surprising prevalence of the abyssomicin BGC within terrestrial habitats, mainly soil and plant-associated, where we have identified five complete and 12 partial new abyssomicin BGCs and 23 new potential abyssomicin BGCs. Our results strongly support the potential of genome and metagenome mining as a key preliminary tool to inform bioprospecting strategies aiming at the identification of new bioactive compounds such as -but not restricted to-abyssomicins.

Download Full-text

Comparative Genomics and Environmental Distribution of Large dsDNA Viruses in the Family Asfarviridae

Frontiers in Microbiology ◽

10.3389/fmicb.2021.657471 ◽

2021 ◽

Vol 12 ◽

Author(s):

Sangita Karki ◽

Mohammad Moniruzzaman ◽

Frank O. Aylward

Keyword(s):

African Swine Fever Virus ◽

Phylogenetic Analyses ◽

Sequence Divergence ◽

African Swine Fever ◽

Salient Feature ◽

Metagenomic Data ◽

Genome Comparison ◽

Marker Genes ◽

Environmental Distribution ◽

The Family

The family Asfarviridae is a group of nucleo-cytoplasmic large DNA viruses (NCLDVs) of which African swine fever virus (ASFV) is well-characterized. Recently the discovery of several Asfarviridae members other than ASFV has suggested that this family represents a diverse and cosmopolitan group of viruses, but the genomics and distribution of this family have not been studied in detail. To this end we analyzed five complete genomes and 35 metagenome-assembled genomes (MAGs) of viruses from this family to shed light on their evolutionary relationships and environmental distribution. The Asfarvirus MAGs derive from diverse marine, freshwater, and terrestrial habitats, underscoring the broad environmental distribution of this family. We present phylogenetic analyses using conserved marker genes and whole-genome comparison of pairwise average amino acid identity (AAI) values, revealing a high level of genomic divergence across disparate Asfarviruses. Further, we found that Asfarviridae genomes encode genes with diverse predicted metabolic roles and detectable sequence homology to proteins in bacteria, archaea, and eukaryotes, highlighting the genomic chimerism that is a salient feature of NCLDV. Our read mapping from Tara oceans metagenomic data also revealed that three Asfarviridae MAGs were present in multiple marine samples, indicating that they are widespread in the ocean. In one of these MAGs we identified four marker genes with > 95% AAI to genes sequenced from a virus that infects the dinoflagellate Heterocapsa circularisquama (HcDNAV). This suggests a potential host for this MAG, which would thereby represent a reference genome of a dinoflagellate-infecting giant virus. Together, these results show that Asfarviridae are ubiquitous, comprise similar sequence divergence as other NCLDV families, and include several members that are widespread in the ocean and potentially infect ecologically important protists.

Download Full-text

Identification of a New Antimicrobial, Desertomycin H, Utilizing a Modified Crowded Plate Technique

Marine Drugs ◽

10.3390/md19080424 ◽

2021 ◽

Vol 19 (8) ◽

pp. 424

Author(s):

Osama G. Mohamed ◽

Sadaf Dorandish ◽

Rebecca Lindow ◽

Megan Steltz ◽

Ifrah Shoukat ◽

...

Keyword(s):

Antibiotic Production ◽

Gene Clusters ◽

Multidrug Resistant ◽

Microbial Interactions ◽

Mass Spectrometry Data ◽

Metagenomic Data ◽

Resistant Bacteria ◽

Biosynthetic Gene ◽

Biosynthetic Gene Clusters ◽

Plate Technique

The antibiotic-resistant bacteria-associated infections are a major global healthcare threat. New classes of antimicrobial compounds are urgently needed as the frequency of infections caused by multidrug-resistant microbes continues to rise. Recent metagenomic data have demonstrated that there is still biosynthetic potential encoded in but transcriptionally silent in cultivatable bacterial genomes. However, the culture conditions required to identify and express silent biosynthetic gene clusters that yield natural products with antimicrobial activity are largely unknown. Here, we describe a new antibiotic discovery scheme, dubbed the modified crowded plate technique (mCPT), that utilizes complex microbial interactions to elicit antimicrobial production from otherwise silent biosynthetic gene clusters. Using the mCPT as part of the antibiotic crowdsourcing educational program Tiny Earth®, we isolated over 1400 antibiotic-producing microbes, including 62, showing activity against multidrug-resistant pathogens. The natural product extracts generated from six microbial isolates showed potent activity against vancomycin-intermediate resistant Staphylococcus aureus. We utilized a targeted approach that coupled mass spectrometry data with bioactivity, yielding a new macrolactone class of metabolite, desertomycin H. In this study, we successfully demonstrate a concept that significantly increased our ability to quickly and efficiently identify microbes capable of the silent antibiotic production.

Download Full-text

Structural characteristics of genomic islands associated with GMP synthases as integration hotspot among sequenced microbial genomes

Computational Biology and Chemistry ◽

10.1016/j.compbiolchem.2012.01.001 ◽

2012 ◽

Vol 36 ◽

pp. 62-70 ◽

Cited By ~ 19

Author(s):

Lei Song ◽

Yuting Pan ◽

Sihong Chen ◽

Xuehong Zhang

Keyword(s):

Structural Characteristics ◽

Genomic Islands ◽

Microbial Genomes

Download Full-text

MetaDEGalaxy: Galaxy workflow for differential abundance analysis of 16s metagenomic data

F1000Research ◽

10.12688/f1000research.18866.2 ◽

2019 ◽

Vol 8 ◽

pp. 726

Author(s):

Mike W.C. Thang ◽

Xin-Yi Chua ◽

Gareth Price ◽

Dominique Gorse ◽

Matt A. Field

Keyword(s):

Microbial Communities ◽

Sequence Data ◽

Metagenomic Data ◽

Marker Genes ◽

Metagenomic Sequencing ◽

Differential Analysis ◽

Biomedical Sciences ◽

Metagenomic Sequence ◽

Differential Abundance ◽

Differential Abundance Analysis

Metagenomic sequencing is an increasingly common tool in environmental and biomedical sciences. While software for detailing the composition of microbial communities using 16S rRNA marker genes is relatively mature, increasingly researchers are interested in identifying changes exhibited within microbial communities under differing environmental conditions. In order to gain maximum value from metagenomic sequence data we must improve the existing analysis environment by providing accessible and scalable computational workflows able to generate reproducible results. Here we describe a complete end-to-end open-source metagenomics workflow running within Galaxy for 16S differential abundance analysis. The workflow accepts 454 or Illumina sequence data (either overlapping or non-overlapping paired end reads) and outputs lists of the operational taxonomic unit (OTUs) exhibiting the greatest change under differing conditions. A range of analysis steps and graphing options are available giving users a high-level of control over their data and analyses. Additionally, users are able to input complex sample-specific metadata information which can be incorporated into differential analysis and used for grouping / colouring within graphs. Detailed tutorials containing sample data and existing workflows are available for three different input types: overlapping and non-overlapping read pairs as well as for pre-generated Biological Observation Matrix (BIOM) files. Using the Galaxy platform we developed MetaDEGalaxy, a complete metagenomics differential abundance analysis workflow. MetaDEGalaxy is designed for bench scientists working with 16S data who are interested in comparative metagenomics. MetaDEGalaxy builds on momentum within the wider Galaxy metagenomics community with the hope that more tools will be added as existing methods mature.

Download Full-text

Pan-Genome of Novel Pantoea stewartii subsp. indologenes Reveals Genes Involved in Onion Pathogenicity and Evidence of Lateral Gene Transfer

Microorganisms ◽

10.3390/microorganisms9081761 ◽

2021 ◽

Vol 9 (8) ◽

pp. 1761

Author(s):

Gaurav Agarwal ◽

Ronald D. Gitaitis ◽

Bhabesh Dutta

Keyword(s):

Gene Transfer ◽

Core Genome ◽

Foxtail Millet ◽

Gene Clusters ◽

Evaluation Study ◽

Full Spectrum ◽

Pan Genome ◽

Pantoea Stewartii ◽

Comparative Phylogenetic Analysis ◽

Core Genes

Pantoea stewartii subsp. indologenes (Psi) is a causative agent of leafspot on foxtail millet and pearl millet; however, novel strains were recently identified that are pathogenic on onions. Our recent host range evaluation study identified two pathovars; P. stewartii subsp. indologenes pv. cepacicola pv. nov. and P. stewartii subsp. indologenes pv. setariae pv. nov. that are pathogenic on onions and millets or on millets only, respectively. In the current study, we developed a pan-genome using the whole genome sequencing of newly identified/classified Psi strains from both pathovars [pv. cepacicola (n = 4) and pv. setariae (n = 13)]. The full spectrum of the pan-genome contained 7030 genes. Among these, 3546 (present in genomes of all 17 strains) were the core genes that were a subset of 3682 soft-core genes (present in ≥16 strains). The accessory genome included 1308 shell genes and 2040 cloud genes (present in ≤2 strains). The pan-genome showed a clear linear progression with >6000 genes, suggesting that the pan-genome of Psi is open. Comparative phylogenetic analysis showed differences in phylogenetic clustering of Pantoea spp. using PAVs/wgMLST approach in comparison with core genome SNPs-based phylogeny. Further, we conducted a horizontal gene transfer (HGT) study using Psi strains from both pathovars along with strains from other Pantoea species, namely, P. stewartii subsp. stewartii LMG 2715T, P. ananatis LMG 2665T, P. agglomerans LMG L15, and P. allii LMG 24248T. A total of 317 HGT events among four Pantoea species were identified with most gene transfer events occurring between Psi pv. cepacicola and Psi pv. setariae. Pan-GWAS analysis predicted a total of 154 genes, including seven gene-clusters, which were associated with the pathogenicity phenotype (necrosis on seedling) on onions. One of the gene-clusters contained 11 genes with known functions and was found to be chromosomally located.

Download Full-text

Novel genomic islands and a new vanD-subtype in the first sporadic VanD-type vancomycin resistant enterococci in Norway

PLoS ONE ◽

10.1371/journal.pone.0255187 ◽

2021 ◽

Vol 16 (7) ◽

pp. e0255187

Author(s):

Mushtaq T. S. AL Rubaye ◽

Jessin Janice ◽

Jørgen Vildershøj Bjørnholt ◽

Aleksandra Jakovljev ◽

Maria Elisabeth Hultström ◽

...

Keyword(s):

Gene Clusters ◽

Vancomycin Resistance ◽

Genomic Islands ◽

Vancomycin Resistant Enterococci ◽

Chromosomal Site ◽

Enterococcus Casseliflavus ◽

Temporal Occurrence ◽

Vancomycin Resistant ◽

High Level

Background Vancomycin-resistant enterococci (VRE) represent several types of transferable vancomycin resistance gene clusters. The vanD type, associated with moderate to high level vancomycin resistance, has only sporadically been described in clinical isolates. The aim of this study was to perform a genetic characterization of the first VanD-type VRE strains detected in Norway. Methods The VanD-type VRE-strains (n = 6) from two patient cases were examined by antimicrobial susceptibility testing and whole genome sequencing (WGS) to uncover Van-phenotype, strain phylogeny, the vanD gene clusters, and their genetic surroundings. The putative transferability of vanD was examined by circularization PCR and filter mating. Results The VanD-type Enterococcus faecium (n = 4) and Enterococcus casseliflavus (n = 2) strains recovered from two cases (A and B), expressed moderate to high level vancomycin resistance (MIC 64—>256 mg/L) and various levels of teicoplanin susceptibility (MIC 2—>256 mg/L). WGS analyses revealed phylogenetically different E. faecium strains (A1, A2, and A3 of case A and B1 from case B) as well as vanD gene clusters located on different novel genomic islands (GIs). The E. casseliflavus strains (B2 and B3 of case B) were not clonally related, but harbored nearly identical novel GIs. The vanD cluster of case B strains represents a novel vanD-subtype. All the vanD-GIs were integrated at the same chromosomal site and contained genes consistent with a Clostridiales origin. Circular forms of the vanD-GIs were detected in all strains except B1. Transfer of vanD to an E. faecium recipient was unsuccessful. Conclusions We describe the first VanD-type E. casseliflavus strains, a novel vanD-subtype, and three novel vanD-GIs with a genetic content consistent with a Clostridiales order origin. Despite temporal occurrence, case A and B E. faecium strains were phylogenetically diverse and harbored different vanD subtypes and vanD-GIs.

Download Full-text

Prediction and analysis of Metagenomic operons via MetaRon: a Pipeline for Prediction of Metagenomic OpeRons

10.21203/rs.2.24239/v1 ◽

2020 ◽

Author(s):

Syed Shujaat Ali Zaidi ◽

Masood Ur Rehman Kayani ◽

Xuegong Zhang ◽

Imran Haider Shamsi

Keyword(s):

Draft Genome ◽

Gene Clusters ◽

Transcriptional Unit ◽

Metagenomic Data ◽

Functional Information ◽

Data Set ◽

E Coli ◽

Operon Prediction ◽

K 12 ◽

Gut Metagenome

Abstract Background: Efficient regulation of bacterial genes against the environmental stimulus results in unique operonic organizations. Lack of complete reference and functional information makes metagenomic operon prediction challenging and therefore opens new perspectives on the interpretation of the host-microbe interactions. Methods: Here we present MetaRon (pipeline for the prediction of Metagenomic operons), an open-source pipeline explicitly designed for the metagenomic shotgun sequencing data. It recreates the operonic structure without functional information. MetaRon identifies closely packed co-directional gene clusters with a promoter upstream and downstream of the first and last gene, respectively. Promoter prediction marks the transcriptional unit boundary (TUB) of closely packed co-directional gene clusters.Results: Escherichia coli (E. coli) K-12 MG1655 presents a gold standard for operon prediction. Therefore, MetaRon was initially implemented on two simulated illumina datasets: (1) E. coli MG1655 genome (2) a mixture of E. coli MG1655, Mycobacterium tuberculosis H37Rv and Bacillus subtilis str. 168 genomes. Operons were predicted in the single genome and mixture of genomes with a sensitivity of 97.8% and 93.7%, respectively. In the next phase, operons predicted from E. coli c20 draft genome isolated from chicken gut metagenome achieved a sensitivity of 94.1%. Lastly, the application of MetaRon on 145 paired-end gut metagenome samples identified 1,232,407 unique operons. Conclusion: MetaRon removes two notable limitations of existing methods: (1) dependency on functional information, and (2) liberates the users from enormous metagenomic data management. Current study showed the idea of using operons as subset to represent the whole-metagenome in terms of secondary metabolites and demonstrated its effectiveness in explaining the occurrence of a disease condition. This will significantly reduce the hefty whole-metagenome data to a small more precise data set. Furthermore, metabolic pathways from the operonic sequences were identified in association with the occurrence of type 2 diabetes (T2D). Presumably, this is the first organized effort to predict metagenomic operons and perform a detailed analysis in association with a disease, in this case T2D. The application of MetaRon to metagenome data at diverse scale will be beneficial to understand the gene regulation and therapeutic metagenomics.

Download Full-text