Large-Scale Metagenome Assembly Reveals Novel Animal-Associated Microbial Genomes, Biosynthetic Gene Clusters, and Other Genetic Diversity

Nicholas D. Youngblut; Jacobo de la Cuesta-Zuluaga; Georg H. Reischer; Silke Dauser; Nathalie Schuster; Chris Walzer; Gabrielle Stalder; Andreas H. Farnleitner; Ruth E. Ley

doi:10.1128/msystems.01045-20

Large-Scale Metagenome Assembly Reveals Novel Animal-Associated Microbial Genomes, Biosynthetic Gene Clusters, and Other Genetic Diversity

mSystems ◽

10.1128/msystems.01045-20 ◽

2020 ◽

Vol 5 (6) ◽

Author(s):

Nicholas D. Youngblut ◽

Jacobo de la Cuesta-Zuluaga ◽

Georg H. Reischer ◽

Silke Dauser ◽

Nathalie Schuster ◽

...

Keyword(s):

Large Scale ◽

Animal Species ◽

Gene Clusters ◽

Genomic Diversity ◽

Data Sets ◽

Biosynthetic Gene ◽

Biosynthetic Gene Clusters ◽

Microbial Genomes ◽

Metagenome Assembly ◽

Gut Metagenome

ABSTRACT Large-scale metagenome assemblies of human microbiomes have produced a vast catalogue of previously unseen microbial genomes; however, comparatively few microbial genomes derive from other vertebrates. Here, we generated 5,596 metagenome-assembled genomes (MAGs) from the gut metagenomes of 180 predominantly wild animal species representing 5 classes, in addition to 14 existing animal gut metagenome data sets. The MAGs comprised 1,522 species-level genome bins (SGBs), most of which were novel at the species, genus, or family level, and the majority were enriched in host versus environment metagenomes. Many traits distinguished SGBs enriched in host or environmental biomes, including the number of antimicrobial resistance genes. We identified 1,986 diverse biosynthetic gene clusters; only 23 clustered with any MIBiG database references. Gene-based assembly revealed tremendous gene diversity, much of it host or environment specific. Our MAG and gene data sets greatly expand the microbial genome repertoire and provide a broad view of microbial adaptations to the vertebrate gut. IMPORTANCE Microbiome studies on a select few mammalian species (e.g., humans, mice, and cattle) have revealed a great deal of novel genomic diversity in the gut microbiome. However, little is known of the microbial diversity in the gut of other vertebrates. We studied the gut microbiomes of a large set of mostly wild animal species consisting of mammals, birds, reptiles, amphibians, and fish. Unfortunately, we found that existing reference databases commonly used for metagenomic analyses failed to capture the microbiome diversity among vertebrates. To increase database representation, we applied advanced metagenome assembly methods to our animal gut data and to many public gut metagenome data sets that had not been used to obtain microbial genomes. Our resulting genome and gene cluster collections comprised a great deal of novel taxonomic and genomic diversity, which we extensively characterized. Our findings substantially expand what is known of microbial genomic diversity in the vertebrate gut.

Download Full-text

Large scale metagenome assembly reveals novel animal-associated microbial genomes, biosynthetic gene clusters, and other genetic diversity

10.1101/2020.06.05.135962 ◽

2020 ◽

Cited By ~ 2

Author(s):

Nicholas D. Youngblut ◽

Jacobo de la Cuesta-Zuluaga ◽

Georg H. Reischer ◽

Silke Dauser ◽

Nathalie Schuster ◽

...

Keyword(s):

Large Scale ◽

Animal Species ◽

Gene Clusters ◽

Wild Animal ◽

Genomic Diversity ◽

Biosynthetic Gene ◽

Biosynthetic Gene Clusters ◽

Microbial Genomes ◽

Metagenome Assembly ◽

Gut Metagenome

AbstractLarge-scale metagenome assemblies of human microbiomes have produced a vast catalogue of previously unseen microbial genomes; however, comparatively few microbial genomes derive from other vertebrates. Here, we generated 5596 metagenome-assembled genomes from the gut metagenomes of 180 predominantly wild animal species representing 5 classes, in addition to 14 existing animal gut metagenome datasets. The MAGs comprised 1522 species-level genome bins (SGBs); most of which were novel at the species, genus, or family levels, and the majority were enriched in host versus environment metagenomes. Many traits distinguished SGBs enriched in host or environmental biomes, including the number of antimicrobial resistance genes. We identified 1986 diverse biosynthetic gene clusters; only 23 clustered with any MIBiG database references. Gene-based assembly revealed tremendous gene diversity, much of it host- or environment-specific. Our MAG and gene datasets greatly expand the microbial genome repertoire and provide a broad view of microbial adaptations to the vertebrate gut.ImportanceMicrobiome studies on a select few mammalian species (e.g., humans, mice, and cattle) have revealed a great deal of novel genomic diversity in the gut microbiome. However, little is known of the microbial diversity in the gut of other vertebrates. We studied the gut microbiome of a large set of mostly wild animal species consisting of mammals, birds, reptiles, amphibians, and fish. Unfortunately, we found that existing reference databases commonly used for metagenomic analyses failed to capture the microbiome diversity among vertebrates. To increase database representation, we applied advanced metagenome assembly methods to our animal gut data and to many public gut metagenome datasets that had not been used to obtain microbial genomes. Our resulting genome and gene cluster collections comprised a great deal of novel taxonomic and genomic diversity, which we extensively characterized. Our findings substantially expand what is known of microbial genomic diversity in the vertebrate gut.

Download Full-text

BiG-SLiCE: A Highly Scalable Tool Maps the Diversity of 1.2 Million Biosynthetic Gene Clusters

10.1101/2020.08.17.240838 ◽

2020 ◽

Cited By ~ 3

Author(s):

Satria A. Kautsar ◽

Justin J. J. van der Hooft ◽

Dick de Ridder ◽

Marnix H. Medema

Keyword(s):

Natural Product ◽

Biological Activities ◽

Genome Mining ◽

Gene Clusters ◽

Genomic Diversity ◽

Biosynthetic Gene ◽

Biosynthetic Gene Clusters ◽

Microbial Genomes ◽

Natural Product Discovery ◽

User Friendly

AbstractBackgroundGenome mining for Biosynthetic Gene Clusters (BGCs) has become an integral part of natural product discovery. The >200,000 microbial genomes now publicly available hold information on abundant novel chemistry. One way to navigate this vast genomic diversity is through comparative analysis of homologous BGCs, which allows identification of cross-species patterns that can be matched to the presence of metabolites or biological activities. However, current tools suffer from a bottleneck caused by the expensive network-based approach used to group these BGCs into Gene Cluster Families (GCFs).ResultsHere, we introduce BiG-SLiCE, a tool designed to cluster massive numbers of BGCs. By representing them in Euclidean space, BiG-SLiCE can group BGCs into GCFs in a non-pairwise, near-linear fashion. We used BiG-SLiCE to analyze 1,225,071 BGCs collected from 209,206 publicly available microbial genomes and metagenome-assembled genomes (MAGs) within ten days on a typical 36-cores CPU server. We demonstrate the utility of such analyses by reconstructing a global map of secondary metabolic diversity across taxonomy to identify uncharted biosynthetic potential. BiG-SLiCE also provides a "query mode" that can efficiently place newly sequenced BGCs into previously computed GCFs, plus a powerful output visualization engine that facilitates user-friendly data exploration.ConclusionsBiG-SLiCE opens up new possibilities to accelerate natural product discovery and offers a first step towards constructing a global, searchable interconnected network of BGCs. As more genomes get sequenced from understudied taxa, more information can be mined to highlight their potentially novel chemistry. BiG-SLiCE is available via https://github.com/medema-group/bigslice.

Download Full-text

BiG-SLiCE: A highly scalable tool maps the diversity of 1.2 million biosynthetic gene clusters

GigaScience ◽

10.1093/gigascience/giaa154 ◽

2021 ◽

Vol 10 (1) ◽

Cited By ~ 1

Author(s):

Satria A Kautsar ◽

Justin J J van der Hooft ◽

Dick de Ridder ◽

Marnix H Medema

Keyword(s):

Natural Product ◽

Biological Activities ◽

Genome Mining ◽

Gene Clusters ◽

Genomic Diversity ◽

Biosynthetic Gene ◽

Biosynthetic Gene Clusters ◽

Microbial Genomes ◽

Natural Product Discovery ◽

User Friendly

Abstract Background Genome mining for biosynthetic gene clusters (BGCs) has become an integral part of natural product discovery. The >200,000 microbial genomes now publicly available hold information on abundant novel chemistry. One way to navigate this vast genomic diversity is through comparative analysis of homologous BGCs, which allows identification of cross-species patterns that can be matched to the presence of metabolites or biological activities. However, current tools are hindered by a bottleneck caused by the expensive network-based approach used to group these BGCs into gene cluster families (GCFs). Results Here, we introduce BiG-SLiCE, a tool designed to cluster massive numbers of BGCs. By representing them in Euclidean space, BiG-SLiCE can group BGCs into GCFs in a non-pairwise, near-linear fashion. We used BiG-SLiCE to analyze 1,225,071 BGCs collected from 209,206 publicly available microbial genomes and metagenome-assembled genomes within 10 days on a typical 36-core CPU server. We demonstrate the utility of such analyses by reconstructing a global map of secondary metabolic diversity across taxonomy to identify uncharted biosynthetic potential. BiG-SLiCE also provides a “query mode” that can efficiently place newly sequenced BGCs into previously computed GCFs, plus a powerful output visualization engine that facilitates user-friendly data exploration. Conclusions BiG-SLiCE opens up new possibilities to accelerate natural product discovery and offers a first step towards constructing a global and searchable interconnected network of BGCs. As more genomes are sequenced from understudied taxa, more information can be mined to highlight their potentially novel chemistry. BiG-SLiCE is available via https://github.com/medema-group/bigslice.

Download Full-text

Pan-Genome of the Genus Streptomyces and Prioritization of Biosynthetic Gene Clusters With Potential to Produce Antibiotic Compounds

Frontiers in Microbiology ◽

10.3389/fmicb.2021.677558 ◽

2021 ◽

Vol 12 ◽

Author(s):

Carlos Caicedo-Montoya ◽

Monserrat Manzo-Ruiz ◽

Rigoberto Ríos-Estepa

Keyword(s):

Comparative Genomics ◽

Sequence Similarity ◽

Experimental Studies ◽

Gene Clusters ◽

Antibiotic Activity ◽

Genomic Diversity ◽

Biosynthetic Gene ◽

Biosynthetic Gene Clusters ◽

Genus Streptomyces ◽

Pan Genome

Species of the genus Streptomyces are known for their ability to produce multiple secondary metabolites; their genomes have been extensively explored to discover new bioactive compounds. The richness of genomic data currently available allows filtering for high quality genomes, which in turn permits reliable comparative genomics studies and an improved prediction of biosynthetic gene clusters (BGCs) through genome mining approaches. In this work, we used 121 genome sequences of the genus Streptomyces in a comparative genomics study with the aim of estimating the genomic diversity by protein domains content, sequence similarity of proteins and conservation of Intergenic Regions (IGRs). We also searched for BGCs but prioritizing those with potential antibiotic activity. Our analysis revealed that the pan-genome of the genus Streptomyces is clearly open, with a high quantity of unique gene families across the different species and that the IGRs are rarely conserved. We also described the phylogenetic relationships of the analyzed genomes using multiple markers, obtaining a trustworthy tree whose relationships were further validated by Average Nucleotide Identity (ANI) calculations. Finally, 33 biosynthetic gene clusters were detected to have potential antibiotic activity and a predicted mode of action, which might serve up as a guide to formulation of related experimental studies.

Download Full-text

The antiSMASH database version 3: increased taxonomic coverage and new query features for modular enzymes

Nucleic Acids Research ◽

10.1093/nar/gkaa978 ◽

2020 ◽

Vol 49 (D1) ◽

pp. D639-D643 ◽

Cited By ~ 1

Author(s):

Kai Blin ◽

Simon Shaw ◽

Satria A Kautsar ◽

Marnix H Medema ◽

Tilmann Weber

Keyword(s):

User Interface ◽

Graphical User Interface ◽

Genome Mining ◽

Gene Clusters ◽

Biosynthetic Gene ◽

Biosynthetic Gene Clusters ◽

High Quality ◽

Microbial Genomes ◽

Fungal Genomes ◽

Interactive Graphical User Interface

Abstract Microorganisms produce natural products that are frequently used in the development of antibacterial, antiviral, and anticancer drugs, pesticides, herbicides, or fungicides. In recent years, genome mining has evolved into a prominent method to access this potential. antiSMASH is one of the most popular tools for this task. Here, we present version 3 of the antiSMASH database, providing a means to access and query precomputed antiSMASH-5.2-detected biosynthetic gene clusters from representative, publicly available, high-quality microbial genomes via an interactive graphical user interface. In version 3, the database contains 147 517 high quality BGC regions from 388 archaeal, 25 236 bacterial and 177 fungal genomes and is available at https://antismash-db.secondarymetabolites.org/.

Download Full-text

IMG-ABC v.5.0: an update to the IMG/Atlas of Biosynthetic Gene Clusters Knowledgebase

Nucleic Acids Research ◽

10.1093/nar/gkz932 ◽

2019 ◽

Cited By ~ 5

Author(s):

Krishnaveni Palaniappan ◽

I-Min A Chen ◽

Ken Chu ◽

Anna Ratner ◽

Rekha Seshadri ◽

...

Keyword(s):

Secondary Metabolites ◽

Secondary Metabolism ◽

Gene Clusters ◽

Biosynthetic Gene ◽

Biosynthetic Gene Clusters ◽

Microbial Genomes ◽

Initial Release ◽

Systematic Identification ◽

Biomedical Potential ◽

Biosynthetic Machinery

Abstract Microbial secondary metabolism is a reservoir of bioactive compounds of immense biotechnological and biomedical potential. The biosynthetic machinery responsible for the production of these secondary metabolites (SMs) (also called natural products) is often encoded by collocated groups of genes called biosynthetic gene clusters (BGCs). High-throughput genome sequencing of both isolates and metagenomic samples combined with the development of specialized computational workflows is enabling systematic identification of BGCs and the discovery of novel SMs. In order to advance exploration of microbial secondary metabolism and its diversity, we developed the largest publicly available database of predicted BGCs combined with experimentally verified BGCs, the Integrated Microbial Genomes Atlas of Biosynthetic gene Clusters (IMG-ABC) (https://img.jgi.doe.gov/abc-public). Here we describe the first major content update of the IMG-ABC knowledgebase, since its initial release in 2015, refreshing the BGC prediction pipeline with the latest version of antiSMASH (v5) as well as presenting the data in the context of underlying environmental metadata sourced from GOLD (https://gold.jgi.doe.gov/). This update has greatly improved the quality and expanded the types of predicted BGCs compared to the previous version.

Download Full-text

TOUCAN: a framework for fungal biosynthetic gene cluster discovery

NAR Genomics and Bioinformatics ◽

10.1093/nargab/lqaa098 ◽

2020 ◽

Vol 2 (4) ◽

Author(s):

Hayda Almeida ◽

Sylvester Palys ◽

Adrian Tsang ◽

Abdoulaye Baniré Diallo

Keyword(s):

Gene Clusters ◽

Biosynthetic Gene Cluster ◽

Amino Acid Sequences ◽

Genomic Diversity ◽

Complex Task ◽

Biosynthetic Gene ◽

Biosynthetic Gene Clusters ◽

Learning Framework ◽

Limited Scope ◽

Selection Of

Abstract Fungal secondary metabolites (SMs) are an important source of numerous bioactive compounds largely applied in the pharmaceutical industry, as in the production of antibiotics and anticancer medications. The discovery of novel fungal SMs can potentially benefit human health. Identifying biosynthetic gene clusters (BGCs) involved in the biosynthesis of SMs can be a costly and complex task, especially due to the genomic diversity of fungal BGCs. Previous studies on fungal BGC discovery present limited scope and can restrict the discovery of new BGCs. In this work, we introduce TOUCAN, a supervised learning framework for fungal BGC discovery. Unlike previous methods, TOUCAN is capable of predicting BGCs on amino acid sequences, facilitating its use on newly sequenced and not yet curated data. It relies on three main pillars: rigorous selection of datasets by BGC experts; combination of functional, evolutionary and compositional features coupled with outperforming classifiers; and robust post-processing methods. TOUCAN best-performing model yields 0.982 F-measure on BGC regions in the Aspergillus niger genome. Overall results show that TOUCAN outperforms previous approaches. TOUCAN focuses on fungal BGCs but can be easily adapted to expand its scope to process other species or include new features.

Download Full-text

TaxiBGC: a Taxonomy-guided Approach for the Identification of Experimentally Verified Microbial Biosynthetic Gene Clusters in Shotgun Metagenomic Data

10.1101/2021.07.30.454505 ◽

2021 ◽

Author(s):

Utpal Bakshi ◽

Vinod K Gupta ◽

Aileen R Lee ◽

John M Davis ◽

Sriram Chandrasekaran ◽

...

Keyword(s):

Large Scale ◽

Human Microbiome ◽

Gene Clusters ◽

Human Microbiome Project ◽

Metagenomic Data ◽

Biosynthetic Gene ◽

Case Control Studies ◽

Biosynthetic Gene Clusters ◽

Host Interactions ◽

Microbiome Data

Biosynthetic gene clusters (BGCs) in microbial genomes encode for the production of bioactive secondary metabolites (SMs). Given the well-recognized importance of SMs in microbe-microbe and microbe-host interactions, the large-scale identification of BGCs from microbial metagenomes could offer novel functional insights into complex chemical ecology. Despite recent progress, currently available tools for predicting BGCs from shotgun metagenomes have several limitations, including the need for computationally demanding read-assembly and prediction of a narrow breadth of BGC classes. To overcome these limitations, we developed TaxiBGC (Taxonomy-guided Identification of Biosynthetic Gene Clusters), a computational pipeline for identifying experimentally verified BGCs in shotgun metagenomes by first pinpointing the microbial species likely to produce them. We show that our species-centric approach was able to identify BGCs in simulated metagenomes more accurately than by solely detecting BGC genes. By applying TaxiBGC on 5,423 metagenomes from the Human Microbiome Project and various case-control studies, we identified distinct BGC signatures of major human body sites and candidate stool-borne biomarkers for multiple diseases, including inflammatory bowel disease, colorectal cancer, and psychiatric disorders. In all, TaxiBGC demonstrates a significant advantage over existing techniques for systematically characterizing BGCs and inferring their SMs from microbiome data.

Download Full-text

Yeast homologous recombination-based promoter engineering for the activation of silent natural product biosynthetic gene clusters

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1507606112 ◽

2015 ◽

Vol 112 (29) ◽

pp. 8953-8958 ◽

Cited By ~ 56

Author(s):

Daniel Montiel ◽

Hahk-Soo Kang ◽

Fang-Yuan Chang ◽

Zachary Charlop-Powers ◽

Sean F. Brady

Keyword(s):

Homologous Recombination ◽

Natural Product ◽

Gene Cluster ◽

Large Scale ◽

Gene Clusters ◽

Marker Genes ◽

Biosynthetic Gene ◽

Biosynthetic Gene Clusters ◽

Silent Gene ◽

Promoter Engineering

Large-scale sequencing of prokaryotic (meta)genomic DNA suggests that most bacterial natural product gene clusters are not expressed under common laboratory culture conditions. Silent gene clusters represent a promising resource for natural product discovery and the development of a new generation of therapeutics. Unfortunately, the characterization of molecules encoded by these clusters is hampered owing to our inability to express these gene clusters in the laboratory. To address this bottleneck, we have developed a promoter-engineering platform to transcriptionally activate silent gene clusters in a model heterologous host. Our approach uses yeast homologous recombination, an auxotrophy complementation-based yeast selection system and sequence orthogonal promoter cassettes to exchange all native promoters in silent gene clusters with constitutively active promoters. As part of this platform, we constructed and validated a set of bidirectional promoter cassettes consisting of orthogonal promoter sequences, Streptomyces ribosome binding sites, and yeast selectable marker genes. Using these tools we demonstrate the ability to simultaneously insert multiple promoter cassettes into a gene cluster, thereby expediting the reengineering process. We apply this method to model active and silent gene clusters (rebeccamycin and tetarimycin) and to the silent, cryptic pseudogene-containing, environmental DNA-derived Lzr gene cluster. Complete promoter refactoring and targeted gene exchange in this “dead” cluster led to the discovery of potent indolotryptoline antiproliferative agents, lazarimides A and B. This potentially scalable and cost-effective promoter reengineering platform should streamline the discovery of natural products from silent natural product biosynthetic gene clusters.

Download Full-text

BiG-FAM: the biosynthetic gene cluster families database

Nucleic Acids Research ◽

10.1093/nar/gkaa812 ◽

2020 ◽

Vol 49 (D1) ◽

pp. D490-D497 ◽

Cited By ~ 3

Author(s):

Satria A Kautsar ◽

Kai Blin ◽

Simon Shaw ◽

Tilmann Weber ◽

Marnix H Medema

Keyword(s):

Gene Cluster ◽

Computational Analysis ◽

Gene Clusters ◽

Taxonomic Diversity ◽

Biosynthetic Gene Cluster ◽

Biosynthetic Gene ◽

Biosynthetic Gene Clusters ◽

Metabolic Potential ◽

Microbial Genomes ◽

Public Resources

Abstract Computational analysis of biosynthetic gene clusters (BGCs) has revolutionized natural product discovery by enabling the rapid investigation of secondary metabolic potential within microbial genome sequences. Grouping homologous BGCs into Gene Cluster Families (GCFs) facilitates mapping their architectural and taxonomic diversity and provides insights into the novelty of putative BGCs, through dereplication with BGCs of known function. While multiple databases exist for exploring BGCs from publicly available data, no public resources exist that focus on GCF relationships. Here, we present BiG-FAM, a database of 29,955 GCFs capturing the global diversity of 1,225,071 BGCs predicted from 209,206 publicly available microbial genomes and metagenome-assembled genomes (MAGs). The database offers rich functionalities, such as multi-criterion GCF searches, direct links to BGC databases such as antiSMASH-DB, and rapid GCF annotation of user-supplied BGCs from antiSMASH results. BiG-FAM can be accessed online at https://bigfam.bioinformatics.nl.

Download Full-text