scholarly journals BugBase predicts organism-level microbiome phenotypes

2017 ◽  
Author(s):  
Tonya Ward ◽  
Jake Larson ◽  
Jeremy Meulemans ◽  
Ben Hillmann ◽  
Joshua Lynch ◽  
...  

AbstractShotgun metagenomics and marker gene amplicon sequencing can be used to directly measure or predict the functional repertoire of the microbiota en masse, but current methods do not readily estimate the functional capability of individual microorganisms. Here we present BugBase, an algorithm that predicts organism-level coverage of functional pathways as well as biologically interpretable phenotypes such as oxygen tolerance, Gram staining and pathogenic potential, within complex microbiomes using either whole-genome shotgun or marker gene sequencing data. We find BugBase’s organism-level pathway coverage predictions to be statistically higher powered than current ‘bag-of-genes’ approaches for discerning functional changes in both host-associated and environmental microbiomes.

2020 ◽  
Vol 11 ◽  
Author(s):  
Alejandro Abdala Asbun ◽  
Marc A. Besseling ◽  
Sergio Balzano ◽  
Judith D. L. van Bleijswijk ◽  
Harry J. Witte ◽  
...  

Marker gene sequencing of the rRNA operon (16S, 18S, ITS) or cytochrome c oxidase I (CO1) is a popular means to assess microbial communities of the environment, microbiomes associated with plants and animals, as well as communities of multicellular organisms via environmental DNA sequencing. Since this technique is based on sequencing a single gene, or even only parts of a single gene rather than the entire genome, the number of reads needed per sample to assess the microbial community structure is lower than that required for metagenome sequencing. This makes marker gene sequencing affordable to nearly any laboratory. Despite the relative ease and cost-efficiency of data generation, analyzing the resulting sequence data requires computational skills that may go beyond the standard repertoire of a current molecular biologist/ecologist. We have developed Cascabel, a scalable, flexible, and easy-to-use amplicon sequence data analysis pipeline, which uses Snakemake and a combination of existing and newly developed solutions for its computational steps. Cascabel takes the raw data as input and delivers a table of operational taxonomic units (OTUs) or Amplicon Sequence Variants (ASVs) in BIOM and text format and representative sequences. Cascabel is a highly versatile software that allows users to customize several steps of the pipeline, such as selecting from a set of OTU clustering methods or performing ASV analysis. In addition, we designed Cascabel to run in any linux/unix computing environment from desktop computers to computing servers making use of parallel processing if possible. The analyses and results are fully reproducible and documented in an HTML and optional pdf report. Cascabel is freely available at Github: https://github.com/AlejandroAb/CASCABEL.


2019 ◽  
Author(s):  
Alejandro Abdala Asbun ◽  
Marc A Besseling ◽  
Sergio Balzano ◽  
Judith van Bleijswijk ◽  
Harry Witte ◽  
...  

ABSTRACTMarker gene sequencing of the rRNA operon (16S, 18S, ITS) or cytochrome c oxidase I (CO1) is a popular means to assess microbial communities of the environment, microbiomes associated with plants and animals, as well as communities of multicellular organisms via environmental DNA sequencing. Since this technique is based on sequencing a single gene rather than the entire genome, the number of reads needed per sample is lower than that required for metagenome sequencing, making marker gene sequencing affordable to nearly any laboratory. Despite the relative ease and cost-efficiency of data generation, analyzing the resulting sequence data requires computational skills that may go beyond the standard repertoire of a current molecular biologist/ecologist. We have developed Cascabel, a flexible and easy-to-use amplicon sequence data analysis pipeline, which uses Snakemake and a combination of existing and newly developed solutions for its computational steps. Cascabel takes the raw data as input and delivers a table of operational taxonomic units (OTUs) and a representative sequence tree. Our pipeline allows customizing the analyses by offering several choices for most of the steps, for example different OTU generating methods. The pipeline can make use of multiple computing nodes and scales from personal computers to computing servers. The analyses and results are fully reproducible and documented in an HTML and optional pdf report. Cascabel is freely available at Github: https://github.com/AlejandroAb/CASCABEL and licensed under GNU GPLv3.


PeerJ ◽  
2016 ◽  
Vol 4 ◽  
pp. e1612 ◽  
Author(s):  
Zachery T. Lewis ◽  
Jasmine C.C. Davis ◽  
Jennifer T. Smilowitz ◽  
J. Bruce German ◽  
Carlito B. Lebrilla ◽  
...  

Infant fecal samples are commonly studied to investigate the impacts of breastfeeding on the development of the microbiota and subsequent health effects. Comparisons of infants living in different geographic regions and environmental contexts are needed to aid our understanding of evolutionarily-selected milk adaptations. However, the preservation of fecal samples from individuals in remote locales until they can be processed can be a challenge. Freeze-drying (lyophilization) offers a cost-effective way to preserve some biological samples for transport and analysis at a later date. Currently, it is unknown what, if any, biases are introduced into various analyses by the freeze-drying process. Here, we investigated how freeze-drying affected analysis of two relevant and intertwined aspects of infant fecal samples, marker gene amplicon sequencing of the bacterial community and the fecal oligosaccharide profile (undigested human milk oligosaccharides). No differences were discovered between the fecal oligosaccharide profiles of wet and freeze-dried samples. The marker gene sequencing data showed an increase in proportional representation ofBacteriodesand a decrease in detection of bifidobacteria and members of class Bacilli after freeze-drying. This sample treatment bias may possibly be related to the cell morphology of these different taxa (Gram status). However, these effects did not overwhelm the natural variation among individuals, as the community data still strongly grouped by subject and not by freeze-drying status. We also found that compensating for sample concentration during freeze-drying, while not necessary, was also not detrimental. Freeze-drying may therefore be an acceptable method of sample preservation and mass reduction for some studies of microbial ecology and milk glycan analysis.


Author(s):  
Lauren V. Alteio ◽  
Joana Séneca ◽  
Alberto Canarini ◽  
Roey Angel ◽  
Ksenia Guseva ◽  
...  

Microbial community analysis via marker gene amplicon sequencing has become a routine method in the field of soil research. In this perspective, we discuss technical challenges and limitations of amplicon sequencing studies in soil and present statistical and experimental approaches that can help addressing the spatio-temporal complexity of soil and the high diversity of organisms therein. We illustrate the impact of compositionality on the interpretation of relative abundance data and discuss effects of sample replication on the statistical power in soil community analysis. Additionally, we argue for the need of increased study reproducibility and data availability, as well as complementary techniques for generating deeper ecological insights into microbial roles and our understanding thereof in soil ecosystems. At this stage, we call upon researchers and specialized soil journals to consider the current state of data analysis, interpretation and availability to improve the rigor of future studies.


2022 ◽  
Vol 23 (1) ◽  
Author(s):  
Renmao Tian ◽  
Behzad Imanian

Abstract Background Amplicon sequencing of marker genes such as 16S rDNA have been widely used to survey and characterize microbial community. However, the complex data analyses have required many interfering manual steps often leading to inconsistencies in results. Results Here, we have developed a pipeline, amplicon sequence analysis pipeline 2 (ASAP 2), to automate and glide through the processes without the usual manual inspections and user’s interference, for instance, in the detection of barcode orientation, selection of high-quality region of reads, and determination of resampling depth and many more. The pipeline integrates all the analytical processes such as importing data, demultiplexing, summarizing read profiles, trimming quality, denoising, removing chimeric sequences and making the feature table among others. The pipeline accepts multiple file formats as input including multiplexed or demultiplexed, paired-end or single-end, barcode inside or outside and raw or intermediate data (e.g. feature table). The outputs include taxonomic classification, alpha/beta diversity, community composition, ordination analysis and statistical tests. ASAP 2 supports merging multiple sequencing runs which helps integrate and compare data from different sources (public databases and collaborators). Conclusions Our pipeline minimizes hands-on interference and runs amplicon sequence variant (ASV)-based amplicon sequencing analysis automatically and consistently. Our web server assists researchers that have no access to high performance computer (HPC) or have limited bioinformatics skills. The pipeline and web server can be accessed at https://github.com/tianrenmaogithub/asap2 and https://hts.iit.edu/asap2, respectively.


2021 ◽  
Author(s):  
Gavin Douglas ◽  
Morgan G. I. Langille

The past decade has seen an eruption of interest in profiling microbiomes through DNA sequencing. The resulting investigations have revealed myriad insights and attracted an influx of researchers to the research area. Many newcomers are in need of primers on the fundamentals of microbiome sequencing data types and the methods used to analyze them. Accordingly, here we aim to provide a detailed, but accessible, introduction to these topics. We first present the background on marker-gene and shotgun metagenomics sequencing and then discuss unique characteristics of microbiome data in general. We highlight several important caveats resulting from these characteristics that should be appreciated when analyzing these data. We then introduce the many-faceted concept of microbial functions and several controversies in this area. One controversy in particular is regarding whether metagenome prediction methods (i.e. based on marker gene sequences) are sufficiently accurate to ensure reliable biological inferences. We next highlight several underappreciated developments regarding the integration of taxonomic and functional data types. This is a highly pertinent topic because although these data types are inherently connected, they are often analyzed independently and primarily only linked anecdotally in the literature. We close by providing our perspective on this topic in addition to the issue of reproducibility in microbiome research, which are both crucial data analysis challenges facing microbiome researchers.


mSystems ◽  
2021 ◽  
Vol 6 (1) ◽  
Author(s):  
Andreas N. Schneider ◽  
John Sundh ◽  
Görel Sundström ◽  
Kerstin Richau ◽  
Nicolas Delhomme ◽  
...  

ABSTRACT The health, growth, and fitness of boreal forest trees are impacted and improved by their associated microbiomes. Microbial gene expression and functional activity can be assayed with RNA sequencing (RNA-Seq) data from host samples. In contrast, phylogenetic marker gene amplicon sequencing data are used to assess taxonomic composition and community structure of the microbiome. Few studies have considered how much of this structural and taxonomic information is included in transcriptomic data from matched samples. Here, we described fungal communities using both host-derived RNA-Seq and fungal ITS1 DNA amplicon sequencing to compare the outcomes between the methods. We used a panel of root and needle samples from the coniferous tree species Picea abies (Norway spruce) growing in untreated (nutrient-deficient) and nutrient-enriched plots at the Flakaliden forest research site in boreal northern Sweden. We show that the relationship between samples and alpha and beta diversity indicated by the fungal transcriptome is in agreement with that generated by the ITS data, while also identifying a lack of taxonomic overlap due to limitations imposed by current database coverage. Furthermore, we demonstrate how metatranscriptomics data additionally provide biologically informative functional insights. At the community level, there were changes in starch and sucrose metabolism, biosynthesis of amino acids, and pentose and glucuronate interconversions, while processing of organic macromolecules, including aromatic and heterocyclic compounds, was enriched in transcripts assigned to the genus Cortinarius. IMPORTANCE A deeper understanding of microbial communities associated with plants is revealing their importance for plant health and productivity. RNA extracted from plant field samples represents the host and other organisms present. Typically, gene expression studies focus on the plant component or, in a limited number of studies, expression in one or more associated organisms. However, metatranscriptomic data are rarely used for taxonomic profiling, which is currently performed using amplicon approaches. We created an assembly-based, reproducible, and hardware-agnostic workflow to taxonomically and functionally annotate fungal RNA-Seq data obtained from Norway spruce roots, which we compared to matching ITS amplicon sequencing data. While we identified some limitations and caveats, we show that functional, taxonomic, and compositional insights can all be obtained from RNA-Seq data. These findings highlight the potential of metatranscriptomics to advance our understanding of interaction, response, and effect between host plants and their associated microbial communities.


2021 ◽  
Author(s):  
Andreas Schneider ◽  
John Sundh ◽  
Görel Sundström ◽  
Kerstin Richau ◽  
Nicolas Delhomme ◽  
...  

<p>Microbial communities are major players in carbon and nitrogen cycling globally and are of particular importance for plant communities in the nutrient poor soils of boreal forests. Especially relevant are the fungal communities in the soil that interact with the plants in multiple ways, indirectly through their pivotal role in the breakdown of organic matter and, more directly, through mycorrhizal symbiosis with plant roots. Large-scale disturbances of these complex microbial communities can lead to shifts in soil carbon storage with unknown and global-scale long-term consequences. To understand the dynamics of these communities and their relationship to associated plants in response to climate change and anthropogenic influence, we need a better understanding of how modern “omics” methods can help us to understand compositional and functional shifts of these microbiomes. Microbial gene expression and functional activity can be assayed with RNA sequencing (RNA-Seq) data from environmental samples. In contrast, currently phylogenetic marker gene amplicon sequencing data is generally used to assess taxonomic composition and community structure of the microbiome. Few studies have considered how much of this structural and taxonomic information is included in RNA-Seq transcriptomic data from matched samples. Here we describe fungal communities using both RNA-Seq and fungal ITS1 DNA amplicon sequencing to compare the outcomes between the methods. We used a panel of root and needle samples from mature stands of the coniferous tree species Picea abies (Norway spruce) growing in untreated (nutrient deficient) and nutrient enriched plots at the Flakaliden forest research site in boreal northern Sweden. We created an assembly-based, reproducible and hardware agnostic workflow to taxonomically and functionally annotate fungal RNA-Seq data obtained from Norway spruce roots, which we compared to matching ITS amplicon sequencing data.<strong> </strong>We show that the community structure indicated by the fungal transcriptome is in agreement with that generated by the ITS data, while also identifying limitations imposed by current database coverage. Furthermore, we show examples to demonstrate how metatranscriptomics data additionally provides biologically informative functional insight at the community and individual species level. These findings highlight the potential of metatranscriptomics to advance our understanding of interaction, response and effect both between host plants and their associated microbial communities, and among the members of microbial communities in environmental samples in general.</p>


2019 ◽  
Author(s):  
Gavin M. Douglas ◽  
Vincent J. Maffei ◽  
Jesse Zaneveld ◽  
Svetlana N. Yurgel ◽  
James R. Brown ◽  
...  

One major limitation of microbial community marker gene sequencing is that it does not provide direct information on the functional composition of sampled communities. Here, we present PICRUSt2 (https://github.com/picrust/picrust2), which expands the capabilities of the original PICRUSt method1 to predict the functional potential of a community based on marker gene sequencing profiles. This updated method and implementation includes several improvements over the previous algorithm: an expanded database of gene families and reference genomes, a new approach now compatible with any OTU-picking or denoising algorithm, and novel phenotype predictions. Upon evaluation, PICRUSt2 was more accurate than PICRUSt1 and other current approaches overall. PICRUSt2 is also now more flexible and allows the addition of custom reference databases. We highlight these improvements and also important caveats regarding the use of predicted metagenomes, which are related to the inherent challenges of analyzing metagenome data in general.


2020 ◽  
Vol 6 (3) ◽  
Author(s):  
Catarina I. Mendes ◽  
Erley Lizarazo ◽  
Miguel P. Machado ◽  
Diogo N. Silva ◽  
Adriana Tami ◽  
...  

Dengue virus (DENV) represents a public health threat and economic burden in affected countries. The availability of genomic data is key to understanding viral evolution and dynamics, supporting improved control strategies. Currently, the use of high-throughput sequencing (HTS) technologies, which can be applied both directly to patient samples (shotgun metagenomics) and to PCR-amplified viral sequences (amplicon sequencing), is potentially the most informative approach to monitor viral dissemination and genetic diversity by providing, in a single methodological step, identification and characterization of the whole viral genome at the nucleotide level. Despite many advantages, these technologies require bioinformatics expertise and appropriate infrastructure for the analysis and interpretation of the resulting data. In addition, the many software solutions available can hamper the reproducibility and comparison of results. Here we present DEN-IM, a one-stop, user-friendly, containerized and reproducible workflow for the analysis of DENV short-read sequencing data from both amplicon and shotgun metagenomics approaches. It is able to infer the DENV coding sequence (CDS), identify the serotype and genotype, and generate a phylogenetic tree. It can easily be run on any UNIX-like system, from local machines to high-performance computing clusters, performing a comprehensive analysis without the requirement for extensive bioinformatics expertise. Using DEN-IM, we successfully analysed two types of DENV datasets. The first comprised 25 shotgun metagenomic sequencing samples from patients with variable serotypes and genotypes, including an in vitro spiked sample containing the four known serotypes. The second consisted of 106 paired-end and 76 single-end amplicon sequences of DENV 3 genotype III and DENV 1 genotype I, respectively, where DEN-IM allowed detection of the intra-genotype diversity. The DEN-IM workflow, parameters and execution configuration files, and documentation are freely available at https://github.com/B-UMMI/DEN-IM).


Sign in / Sign up

Export Citation Format

Share Document