BugBase predicts organism-level microbiome phenotypes

Cascabel: A Scalable and Versatile Amplicon Sequence Data Analysis Pipeline Delivering Reproducible and Documented Results

Frontiers in Genetics ◽

10.3389/fgene.2020.489357 ◽

2020 ◽

Vol 11 ◽

Author(s):

Alejandro Abdala Asbun ◽

Marc A. Besseling ◽

Sergio Balzano ◽

Judith D. L. van Bleijswijk ◽

Harry J. Witte ◽

...

Keyword(s):

Data Analysis ◽

Sequence Data ◽

Single Gene ◽

Marker Gene ◽

Gene Sequencing ◽

Data Generation ◽

Clustering Methods ◽

Analysis Pipeline ◽

Data Analysis Pipeline ◽

Marker Gene Sequencing

Marker gene sequencing of the rRNA operon (16S, 18S, ITS) or cytochrome c oxidase I (CO1) is a popular means to assess microbial communities of the environment, microbiomes associated with plants and animals, as well as communities of multicellular organisms via environmental DNA sequencing. Since this technique is based on sequencing a single gene, or even only parts of a single gene rather than the entire genome, the number of reads needed per sample to assess the microbial community structure is lower than that required for metagenome sequencing. This makes marker gene sequencing affordable to nearly any laboratory. Despite the relative ease and cost-efficiency of data generation, analyzing the resulting sequence data requires computational skills that may go beyond the standard repertoire of a current molecular biologist/ecologist. We have developed Cascabel, a scalable, flexible, and easy-to-use amplicon sequence data analysis pipeline, which uses Snakemake and a combination of existing and newly developed solutions for its computational steps. Cascabel takes the raw data as input and delivers a table of operational taxonomic units (OTUs) or Amplicon Sequence Variants (ASVs) in BIOM and text format and representative sequences. Cascabel is a highly versatile software that allows users to customize several steps of the pipeline, such as selecting from a set of OTU clustering methods or performing ASV analysis. In addition, we designed Cascabel to run in any linux/unix computing environment from desktop computers to computing servers making use of parallel processing if possible. The analyses and results are fully reproducible and documented in an HTML and optional pdf report. Cascabel is freely available at Github: https://github.com/AlejandroAb/CASCABEL.

Download Full-text

Cascabel: a flexible, scalable and easy-to-use amplicon sequence data analysis pipeline

10.1101/809384 ◽

2019 ◽

Cited By ~ 3

Author(s):

Alejandro Abdala Asbun ◽

Marc A Besseling ◽

Sergio Balzano ◽

Judith van Bleijswijk ◽

Harry Witte ◽

...

Keyword(s):

Data Analysis ◽

Sequence Data ◽

Single Gene ◽

Marker Gene ◽

Gene Sequencing ◽

Data Generation ◽

Analysis Pipeline ◽

Entire Genome ◽

Data Analysis Pipeline ◽

Marker Gene Sequencing

ABSTRACTMarker gene sequencing of the rRNA operon (16S, 18S, ITS) or cytochrome c oxidase I (CO1) is a popular means to assess microbial communities of the environment, microbiomes associated with plants and animals, as well as communities of multicellular organisms via environmental DNA sequencing. Since this technique is based on sequencing a single gene rather than the entire genome, the number of reads needed per sample is lower than that required for metagenome sequencing, making marker gene sequencing affordable to nearly any laboratory. Despite the relative ease and cost-efficiency of data generation, analyzing the resulting sequence data requires computational skills that may go beyond the standard repertoire of a current molecular biologist/ecologist. We have developed Cascabel, a flexible and easy-to-use amplicon sequence data analysis pipeline, which uses Snakemake and a combination of existing and newly developed solutions for its computational steps. Cascabel takes the raw data as input and delivers a table of operational taxonomic units (OTUs) and a representative sequence tree. Our pipeline allows customizing the analyses by offering several choices for most of the steps, for example different OTU generating methods. The pipeline can make use of multiple computing nodes and scales from personal computers to computing servers. The analyses and results are fully reproducible and documented in an HTML and optional pdf report. Cascabel is freely available at Github: https://github.com/AlejandroAb/CASCABEL and licensed under GNU GPLv3.

Download Full-text

The impact of freeze-drying infant fecal samples on measures of their bacterial community profiles and milk-derived oligosaccharide content

PeerJ ◽

10.7717/peerj.1612 ◽

2016 ◽

Vol 4 ◽

pp. e1612 ◽

Cited By ~ 8

Author(s):

Zachery T. Lewis ◽

Jasmine C.C. Davis ◽

Jennifer T. Smilowitz ◽

J. Bruce German ◽

Carlito B. Lebrilla ◽

...

Keyword(s):

Bacterial Community ◽

Freeze Drying ◽

Marker Gene ◽

Cost Effective ◽

Amplicon Sequencing ◽

Sample Treatment ◽

Sequencing Data ◽

Fecal Samples ◽

Freeze Dried ◽

The Impact

Infant fecal samples are commonly studied to investigate the impacts of breastfeeding on the development of the microbiota and subsequent health effects. Comparisons of infants living in different geographic regions and environmental contexts are needed to aid our understanding of evolutionarily-selected milk adaptations. However, the preservation of fecal samples from individuals in remote locales until they can be processed can be a challenge. Freeze-drying (lyophilization) offers a cost-effective way to preserve some biological samples for transport and analysis at a later date. Currently, it is unknown what, if any, biases are introduced into various analyses by the freeze-drying process. Here, we investigated how freeze-drying affected analysis of two relevant and intertwined aspects of infant fecal samples, marker gene amplicon sequencing of the bacterial community and the fecal oligosaccharide profile (undigested human milk oligosaccharides). No differences were discovered between the fecal oligosaccharide profiles of wet and freeze-dried samples. The marker gene sequencing data showed an increase in proportional representation ofBacteriodesand a decrease in detection of bifidobacteria and members of class Bacilli after freeze-drying. This sample treatment bias may possibly be related to the cell morphology of these different taxa (Gram status). However, these effects did not overwhelm the natural variation among individuals, as the community data still strongly grouped by subject and not by freeze-drying status. We also found that compensating for sample concentration during freeze-drying, while not necessary, was also not detrimental. Freeze-drying may therefore be an acceptable method of sample preservation and mass reduction for some studies of microbial ecology and milk glycan analysis.

Download Full-text

A critical perspective on interpreting amplicon sequencing data in soil ecological research

10.22541/au.161919535.51886448/v1 ◽

2021 ◽

Author(s):

Lauren V. Alteio ◽

Joana Séneca ◽

Alberto Canarini ◽

Roey Angel ◽

Ksenia Guseva ◽

...

Keyword(s):

Statistical Power ◽

Marker Gene ◽

Community Analysis ◽

Amplicon Sequencing ◽

Data Availability ◽

Microbial Community Analysis ◽

Sequencing Data ◽

Spatio Temporal ◽

Complementary Techniques ◽

The Impact

Microbial community analysis via marker gene amplicon sequencing has become a routine method in the field of soil research. In this perspective, we discuss technical challenges and limitations of amplicon sequencing studies in soil and present statistical and experimental approaches that can help addressing the spatio-temporal complexity of soil and the high diversity of organisms therein. We illustrate the impact of compositionality on the interpretation of relative abundance data and discuss effects of sample replication on the statistical power in soil community analysis. Additionally, we argue for the need of increased study reproducibility and data availability, as well as complementary techniques for generating deeper ecological insights into microbial roles and our understanding thereof in soil ecosystems. At this stage, we call upon researchers and specialized soil journals to consider the current state of data analysis, interpretation and availability to improve the rigor of future studies.

Download Full-text

ASAP 2: a pipeline and web server to analyze marker gene amplicon sequencing data automatically and consistently

BMC Bioinformatics ◽

10.1186/s12859-021-04555-0 ◽

2022 ◽

Vol 23 (1) ◽

Author(s):

Renmao Tian ◽

Behzad Imanian

Keyword(s):

Statistical Tests ◽

Marker Gene ◽

Web Server ◽

Amplicon Sequencing ◽

Marker Genes ◽

Sequence Variant ◽

Complex Data ◽

Sequencing Analysis ◽

Sequencing Data ◽

Link Type

Abstract Background Amplicon sequencing of marker genes such as 16S rDNA have been widely used to survey and characterize microbial community. However, the complex data analyses have required many interfering manual steps often leading to inconsistencies in results. Results Here, we have developed a pipeline, amplicon sequence analysis pipeline 2 (ASAP 2), to automate and glide through the processes without the usual manual inspections and user’s interference, for instance, in the detection of barcode orientation, selection of high-quality region of reads, and determination of resampling depth and many more. The pipeline integrates all the analytical processes such as importing data, demultiplexing, summarizing read profiles, trimming quality, denoising, removing chimeric sequences and making the feature table among others. The pipeline accepts multiple file formats as input including multiplexed or demultiplexed, paired-end or single-end, barcode inside or outside and raw or intermediate data (e.g. feature table). The outputs include taxonomic classification, alpha/beta diversity, community composition, ordination analysis and statistical tests. ASAP 2 supports merging multiple sequencing runs which helps integrate and compare data from different sources (public databases and collaborators). Conclusions Our pipeline minimizes hands-on interference and runs amplicon sequence variant (ASV)-based amplicon sequencing analysis automatically and consistently. Our web server assists researchers that have no access to high performance computer (HPC) or have limited bioinformatics skills. The pipeline and web server can be accessed at https://github.com/tianrenmaogithub/asap2 and https://hts.iit.edu/asap2, respectively.

Download Full-text

A primer and discussion on DNA-based microbiome data and related bioinformatics analyses

10.31219/osf.io/3dybg ◽

2021 ◽

Author(s):

Gavin Douglas ◽

Morgan G. I. Langille

Keyword(s):

Marker Gene ◽

Research Area ◽

Sequencing Data ◽

Data Types ◽

Bioinformatics Analyses ◽

Shotgun Metagenomics ◽

The Past ◽

Microbiome Research ◽

The Many ◽

Microbiome Data

The past decade has seen an eruption of interest in profiling microbiomes through DNA sequencing. The resulting investigations have revealed myriad insights and attracted an influx of researchers to the research area. Many newcomers are in need of primers on the fundamentals of microbiome sequencing data types and the methods used to analyze them. Accordingly, here we aim to provide a detailed, but accessible, introduction to these topics. We first present the background on marker-gene and shotgun metagenomics sequencing and then discuss unique characteristics of microbiome data in general. We highlight several important caveats resulting from these characteristics that should be appreciated when analyzing these data. We then introduce the many-faceted concept of microbial functions and several controversies in this area. One controversy in particular is regarding whether metagenome prediction methods (i.e. based on marker gene sequences) are sufficiently accurate to ensure reliable biological inferences. We next highlight several underappreciated developments regarding the integration of taxonomic and functional data types. This is a highly pertinent topic because although these data types are inherently connected, they are often analyzed independently and primarily only linked anecdotally in the literature. We close by providing our perspective on this topic in addition to the issue of reproducibility in microbiome research, which are both crucial data analysis challenges facing microbiome researchers.

Download Full-text

Comparative Fungal Community Analyses Using Metatranscriptomics and Internal Transcribed Spacer Amplicon Sequencing from Norway Spruce

mSystems ◽

10.1128/msystems.00884-20 ◽

2021 ◽

Vol 6 (1) ◽

Author(s):

Andreas N. Schneider ◽

John Sundh ◽

Görel Sundström ◽

Kerstin Richau ◽

Nicolas Delhomme ◽

...

Keyword(s):

Gene Expression ◽

Microbial Communities ◽

Norway Spruce ◽

Marker Gene ◽

Amplicon Sequencing ◽

Rna Seq ◽

Sequencing Data ◽

Plant Component ◽

Alpha And Beta Diversity ◽

Field Samples

ABSTRACT The health, growth, and fitness of boreal forest trees are impacted and improved by their associated microbiomes. Microbial gene expression and functional activity can be assayed with RNA sequencing (RNA-Seq) data from host samples. In contrast, phylogenetic marker gene amplicon sequencing data are used to assess taxonomic composition and community structure of the microbiome. Few studies have considered how much of this structural and taxonomic information is included in transcriptomic data from matched samples. Here, we described fungal communities using both host-derived RNA-Seq and fungal ITS1 DNA amplicon sequencing to compare the outcomes between the methods. We used a panel of root and needle samples from the coniferous tree species Picea abies (Norway spruce) growing in untreated (nutrient-deficient) and nutrient-enriched plots at the Flakaliden forest research site in boreal northern Sweden. We show that the relationship between samples and alpha and beta diversity indicated by the fungal transcriptome is in agreement with that generated by the ITS data, while also identifying a lack of taxonomic overlap due to limitations imposed by current database coverage. Furthermore, we demonstrate how metatranscriptomics data additionally provide biologically informative functional insights. At the community level, there were changes in starch and sucrose metabolism, biosynthesis of amino acids, and pentose and glucuronate interconversions, while processing of organic macromolecules, including aromatic and heterocyclic compounds, was enriched in transcripts assigned to the genus Cortinarius. IMPORTANCE A deeper understanding of microbial communities associated with plants is revealing their importance for plant health and productivity. RNA extracted from plant field samples represents the host and other organisms present. Typically, gene expression studies focus on the plant component or, in a limited number of studies, expression in one or more associated organisms. However, metatranscriptomic data are rarely used for taxonomic profiling, which is currently performed using amplicon approaches. We created an assembly-based, reproducible, and hardware-agnostic workflow to taxonomically and functionally annotate fungal RNA-Seq data obtained from Norway spruce roots, which we compared to matching ITS amplicon sequencing data. While we identified some limitations and caveats, we show that functional, taxonomic, and compositional insights can all be obtained from RNA-Seq data. These findings highlight the potential of metatranscriptomics to advance our understanding of interaction, response, and effect between host plants and their associated microbial communities.

Download Full-text

Comparative fungal community analyses using metatranscriptomics and ITS-amplicon sequencing from Norway spruce

10.5194/egusphere-egu21-7249 ◽

2021 ◽

Author(s):

Andreas Schneider ◽

John Sundh ◽

Görel Sundström ◽

Kerstin Richau ◽

Nicolas Delhomme ◽

...

Keyword(s):

Community Structure ◽

Microbial Communities ◽

Norway Spruce ◽

Environmental Samples ◽

Marker Gene ◽

Amplicon Sequencing ◽

Fungal Communities ◽

Individual Species ◽

Rna Seq ◽

Sequencing Data

<p>Microbial communities are major players in carbon and nitrogen cycling globally and are of particular importance for plant communities in the nutrient poor soils of boreal forests. Especially relevant are the fungal communities in the soil that interact with the plants in multiple ways, indirectly through their pivotal role in the breakdown of organic matter and, more directly, through mycorrhizal symbiosis with plant roots. Large-scale disturbances of these complex microbial communities can lead to shifts in soil carbon storage with unknown and global-scale long-term consequences. To understand the dynamics of these communities and their relationship to associated plants in response to climate change and anthropogenic influence, we need a better understanding of how modern &#8220;omics&#8221; methods can help us to understand compositional and functional shifts of these microbiomes. Microbial gene expression and functional activity can be assayed with RNA&#160;sequencing (RNA-Seq)&#160;data from environmental samples. In contrast,&#160;currently phylogenetic&#160;marker gene amplicon sequencing data is generally used to assess taxonomic composition and community structure of the microbiome. Few studies have considered how much of this structural and taxonomic information is included in RNA-Seq transcriptomic data from matched samples. Here we describe fungal communities using both RNA-Seq&#160;and&#160;fungal&#160;ITS1 DNA&#160;amplicon sequencing to compare the outcomes between the methods. We used a panel of&#160;root and needle&#160;samples from&#160;mature stands of the coniferous tree species&#160;Picea&#160;abies&#160;(Norway spruce) growing in untreated (nutrient deficient) and nutrient enriched plots at the Flakaliden forest research site in boreal northern Sweden. We created an assembly-based, reproducible and hardware agnostic workflow to taxonomically and functionally annotate fungal RNA-Seq data obtained from Norway spruce roots, which we compared to matching ITS amplicon sequencing data.<strong>&#160;</strong>We show that the community structure indicated by the fungal transcriptome is in agreement with that generated by the ITS data, while also identifying limitations imposed by current database coverage. Furthermore, we show examples to demonstrate how metatranscriptomics data additionally provides biologically informative functional insight at the community and individual species level.&#160;These findings highlight the potential of metatranscriptomics to advance our understanding of interaction, response and effect both between host plants and their associated microbial communities, and among the members of microbial communities in environmental samples in general.</p>

Download Full-text

PICRUSt2: An improved and customizable approach for metagenome inference

10.1101/672295 ◽

2019 ◽

Cited By ~ 89

Author(s):

Gavin M. Douglas ◽

Vincent J. Maffei ◽

Jesse Zaneveld ◽

Svetlana N. Yurgel ◽

James R. Brown ◽

...

Keyword(s):

Marker Gene ◽

Gene Sequencing ◽

Gene Families ◽

Community Based ◽

New Approach ◽

Functional Potential ◽

Reference Databases ◽

Previous Algorithm ◽

Reference Genomes ◽

Marker Gene Sequencing

One major limitation of microbial community marker gene sequencing is that it does not provide direct information on the functional composition of sampled communities. Here, we present PICRUSt2 (https://github.com/picrust/picrust2), which expands the capabilities of the original PICRUSt method1 to predict the functional potential of a community based on marker gene sequencing profiles. This updated method and implementation includes several improvements over the previous algorithm: an expanded database of gene families and reference genomes, a new approach now compatible with any OTU-picking or denoising algorithm, and novel phenotype predictions. Upon evaluation, PICRUSt2 was more accurate than PICRUSt1 and other current approaches overall. PICRUSt2 is also now more flexible and allows the addition of custom reference databases. We highlight these improvements and also important caveats regarding the use of predicted metagenomes, which are related to the inherent challenges of analyzing metagenome data in general.

Download Full-text

DEN-IM: dengue virus genotyping from amplicon and shotgun metagenomic sequencing

Microbial Genomics ◽

10.1099/mgen.0.000328 ◽

2020 ◽

Vol 6 (3) ◽

Author(s):

Catarina I. Mendes ◽

Erley Lizarazo ◽

Miguel P. Machado ◽

Diogo N. Silva ◽

Adriana Tami ◽

...

Keyword(s):

Dengue Virus ◽

High Performance ◽

High Throughput Sequencing ◽

Amplicon Sequencing ◽

Metagenomic Sequencing ◽

Sequencing Data ◽

Genotype I ◽

Shotgun Metagenomics ◽

Shotgun Metagenomic Sequencing

Dengue virus (DENV) represents a public health threat and economic burden in affected countries. The availability of genomic data is key to understanding viral evolution and dynamics, supporting improved control strategies. Currently, the use of high-throughput sequencing (HTS) technologies, which can be applied both directly to patient samples (shotgun metagenomics) and to PCR-amplified viral sequences (amplicon sequencing), is potentially the most informative approach to monitor viral dissemination and genetic diversity by providing, in a single methodological step, identification and characterization of the whole viral genome at the nucleotide level. Despite many advantages, these technologies require bioinformatics expertise and appropriate infrastructure for the analysis and interpretation of the resulting data. In addition, the many software solutions available can hamper the reproducibility and comparison of results. Here we present DEN-IM, a one-stop, user-friendly, containerized and reproducible workflow for the analysis of DENV short-read sequencing data from both amplicon and shotgun metagenomics approaches. It is able to infer the DENV coding sequence (CDS), identify the serotype and genotype, and generate a phylogenetic tree. It can easily be run on any UNIX-like system, from local machines to high-performance computing clusters, performing a comprehensive analysis without the requirement for extensive bioinformatics expertise. Using DEN-IM, we successfully analysed two types of DENV datasets. The first comprised 25 shotgun metagenomic sequencing samples from patients with variable serotypes and genotypes, including an in vitro spiked sample containing the four known serotypes. The second consisted of 106 paired-end and 76 single-end amplicon sequences of DENV 3 genotype III and DENV 1 genotype I, respectively, where DEN-IM allowed detection of the intra-genotype diversity. The DEN-IM workflow, parameters and execution configuration files, and documentation are freely available at https://github.com/B-UMMI/DEN-IM).

Download Full-text