System Biology Modeling with Compositional Microbiome Data Reveals Personalized Gut Microbial Dynamics and Keystone Species

Mapping Intimacies ◽

10.1101/288803 ◽

2018 ◽

Author(s):

Chenhao Li ◽

Lisa Tucker-Kellogg ◽

Niranjan Nagarajan

Keyword(s):

Microbial Communities ◽

High Throughput Sequencing ◽

Community Dynamics ◽

Keystone Species ◽

Microbial Interactions ◽

Natural Environments ◽

Sequencing Data ◽

Predator Prey ◽

Public Datasets ◽

Microbiome Data

AbstractA growing body of literature points to the important roles that different microbial communities play in diverse natural environments and the human body. The dynamics of these communities is driven by a range of microbial interactions from symbiosis to predator-prey relationships, the majority of which are poorly understood, making it hard to predict the response of the community to different perturbations. With the increasing availability of high-throughput sequencing based community composition data, it is now conceivable to directly learn models that explicitly define microbial interactions and explain community dynamics. The applicability of these approaches is however affected by several experimental limitations, particularly the compositional nature of sequencing data. We present a new computational approach (BEEM) that addresses this key limitation in the inference of generalised Lotka-Volterra models (gLVMs) by coupling biomass estimation and model inference in an expectation maximization like algorithm (BEEM). Surprisingly, BEEM outperforms state-of-the-art methods for inferring gLVMs, while simultaneously eliminating the need for additional experimental biomass data as input. BEEM’s application to previously inaccessible public datasets (due to the lack of biomass data) allowed us for the first time to analyse microbial communities in the human gut on a per individual basis, revealing personalised dynamics and keystone species.

Download Full-text

Measuring and Mitigating PCR Bias in Microbiome Data

10.1101/604025 ◽

2019 ◽

Cited By ~ 3

Author(s):

Justin D. Silverman ◽

Rachael J. Bloom ◽

Sharon Jiang ◽

Heather K. Durand ◽

Sayan Mukherjee ◽

...

Keyword(s):

Microbial Communities ◽

High Throughput ◽

High Throughput Sequencing ◽

Linear Models ◽

Pcr Amplification ◽

Computational Techniques ◽

Common Source ◽

Sequencing Data ◽

Pcr Bias ◽

Microbiome Data

AbstractPCR amplification plays a central role in the measurement of mixed microbial communities via high-throughput sequencing. Yet PCR is also known to be a common source of bias in microbiome data. Here we present a paired modeling and experimental approach to characterize and mitigate PCR bias in microbiome studies. We use experimental data from mock bacterial communities to validate our approach and human gut microbiota samples to characterize PCR bias under real-world conditions. Our results suggest that PCR can bias estimates of microbial relative abundances by a factor of 2-4 but that this bias can be mitigated using simple Bayesian multinomial logistic-normal linear models.Author summaryHigh-throughput sequencing is often used to profile host-associated microbial communities. Many processing steps are required to transform a community of bacteria into a pool of DNA suitable for sequencing. One important step is amplification where, to create enough DNA for sequencing, DNA from many different bacteria are repeatedly copied using a technique called Polymerase Chain Reaction (PCR). However, PCR is known to introduce bias as DNA from some bacteria are more efficiently copied than others. Here we introduce an experimental procedure that allows this bias to be measured and computational techniques that allow this bias to be mitigated in sequencing data.

Download Full-text

Putative functions and co-occurrence patterns of the microbial communities in natural and engineered ecosystems

10.21203/rs.3.rs-510082/v1 ◽

2021 ◽

Author(s):

Yu Xia ◽

Na Li ◽

Yiyun Chen ◽

Weijia Li ◽

Xuwen He ◽

...

Keyword(s):

Microbial Communities ◽

High Throughput Sequencing ◽

Wastewater Treatment Plants ◽

Rrna Gene ◽

Natural Environments ◽

Anaerobic Digesters ◽

Network Analyses ◽

Wb Network ◽

Almost All ◽

Occurrence Patterns

Abstract Understanding functions and co-occurrence patterns of microbial communities in various ecosystems enriches the knowledge on ecosystem characteristics and microbial ecology. However, such analyses have rarely been reported. Herein, functions and inter-taxa correlations of microbial communities in a set of natural environments (farmland (SA), forest soil (SB) and Caspian Sea sediments (CSS)) and engineered ecosystems (wastewater treatment plants (FW, WA and WB) and anaerobic digesters (AD)) were studied based on FAPROTAX and network analyses, respectively, by a collection of 115 samples from seven published 16S rRNA gene datasets generated by high-throughput sequencing. The results show that chemoheterotrophy related populations were the most abundant in almost all the communities. Their relative abundances (RAs) in the AD systems were the highest (43.7%±4.2%), followed by those of the soil environments (40.2%±1.9% in SA and 36.4%±2.0% in SB). For each ecosystem, the indicative community and overall community showed differentiations in several function categories. For example, the SA and SB indicative communities showed higher RAs in aerobic chemoheterotrophy, the CSS indicative community showed higher RAs in sulfate respiration, the AD indicative community showed higher RAs in fermentation, and the WB indicative community included higher RAs of predatory/exoparasitic bacteria. Three molecular ecological networks of the communities from the AD, WB and SB datasets were constructed, respectively. The WB network showed the highest proportion of negative correlations (70.4%), possibly attributed to the environmental pressure which aggravated microbial competition. The positively correlated taxa showed lower phylogenetic distances than the negatively correlated taxa on average in each network.

Download Full-text

High-throughput sequencing data and antibiotic resistance mechanisms of soil microbial communities in non-irrigated and irrigated soils with raw sewage in African cities

Data in Brief ◽

10.1016/j.dib.2019.104638 ◽

2019 ◽

Vol 27 ◽

pp. 104638 ◽

Cited By ~ 1

Author(s):

B.P. Bougnom ◽

S. Thiele-Bruhn ◽

V. Ricci ◽

C. Zongo ◽

L.J.V. Piddock

Keyword(s):

Antibiotic Resistance ◽

Microbial Communities ◽

High Throughput ◽

High Throughput Sequencing ◽

Soil Microbial Communities ◽

Resistance Mechanisms ◽

Sequencing Data ◽

Soil Microbial ◽

High Throughput Sequencing Data ◽

Irrigated Soils

Download Full-text

OrtSuite – a flexible pipeline for annotation of ecosystem processes and prediction of putative microbial interactions

10.21203/rs.3.rs-52281/v1 ◽

2020 ◽

Author(s):

João Pedro Saraiva ◽

Marta Gomes ◽

René Kallies ◽

Carsten Vogt ◽

Antonis Chatzinotas ◽

...

Keyword(s):

Functional Annotation ◽

High Throughput Sequencing ◽

Sequence Similarity ◽

Microbial Community Composition ◽

Ecosystem Processes ◽

Microbial Interactions ◽

Interspecies Interactions ◽

Sequencing Data ◽

Computer Clusters ◽

Identity Threshold

Abstract Background: The exponential increase in high-throughput sequencing data and the development of computational sciences and bioinformatics pipelines has advanced our understanding of microbial community composition and distribution in complex ecosystems. Despite these advances, the identification of microbial interactions from genomic data remains a major bottleneck. To address this challenge, we present OrtSuite, a flexible workflow to predict putative microbial interactions based on genomic content. Results: OrtSuite combines ortholog clustering strategies with genome annotation based on a user-defined set of functions allowing for hypothesis-driven data analysis. OrtSuit allows users to install and run all workflow components and analyze the generated outputs using a simple pipeline consisting of 23 bash commands and one R command. Annotation is based on a two-stage process. First, only a subset of sequences from each ortholog cluster are aligned to all sequences in the Ortholog-Reaction Association database (ORAdb). Next, all sequences from clusters that meet a user-defined identity threshold are aligned to all sequence sets in ORAdb to which they had a hit. This approach results in a decrease in time needed for functional annotation. Further, OrtSuit identifies putative interspecies interactions based on their individual genomic content based on constrains given by the users. Additional control is afforded to the user at several stages of the workflow: 1) The construction of ORAdb only needs to be performed once for each specific process also allowing manual curation; 2) The identity and sequence similarity thresholds used during the annotation stage can be adjusted; and 3) Constraints related to pathway reaction composition and known species contributions to ecosystem processes can be defined. Conclusions: OrtSuit is an easy to use workflow that allows for rapid functional annotation based on a user curated database. Further, this novel workflow allows the identification of interspecies interactions through user-defined constrains. Due to its low computational demands, for small datasets (e.g. maximum 100 genomes) OrtSuit can run on a personal computer. For larger datasets (> 100 genomes), we suggest the use of computer clusters. OrtSuit is an open-source software available at https://github.com/mdsufz/OrtSuit .

Download Full-text

Compositional analysis: a valid approach to analyze microbiome high-throughput sequencing data

Canadian Journal of Microbiology ◽

10.1139/cjm-2015-0821 ◽

2016 ◽

Vol 62 (8) ◽

pp. 692-703 ◽

Cited By ~ 132

Author(s):

Gregory B. Gloor ◽

Gregor Reid

Keyword(s):

Data Analysis ◽

High Throughput ◽

High Throughput Sequencing ◽

Compositional Data ◽

Critical Role ◽

Compositional Data Analysis ◽

Data Sets ◽

Clear Understanding ◽

Sequencing Data ◽

Microbiome Data

A workshop held at the 2015 annual meeting of the Canadian Society of Microbiologists highlighted compositional data analysis methods and the importance of exploratory data analysis for the analysis of microbiome data sets generated by high-throughput DNA sequencing. A summary of the content of that workshop, a review of new methods of analysis, and information on the importance of careful analyses are presented herein. The workshop focussed on explaining the rationale behind the use of compositional data analysis, and a demonstration of these methods for the examination of 2 microbiome data sets. A clear understanding of bioinformatics methodologies and the type of data being analyzed is essential, given the growing number of studies uncovering the critical role of the microbiome in health and disease and the need to understand alterations to its composition and function following intervention with fecal transplant, probiotics, diet, and pharmaceutical agents.

Download Full-text

Inferring microbial co-occurrence networks from amplicon data: a systematic evaluation

10.1101/2020.09.23.309781 ◽

2020 ◽

Author(s):

Dileep Kishore ◽

Gabriel Birzu ◽

Zhenjun Hu ◽

Charles DeLisi ◽

Kirill S. Korolev ◽

...

Keyword(s):

Microbial Communities ◽

High Throughput Sequencing ◽

Systematic Evaluation ◽

Sequencing Data ◽

Systematic Analysis ◽

16S Sequencing ◽

Multiple Data Sets ◽

And Function ◽

And Control ◽

Multiple Samples

AbstractMicrobes tend to organize into communities consisting of hundreds of species involved in complex interactions with each other. 16S ribosomal RNA (16S rRNA) amplicon profiling provides snapshots that reveal the phylogenies and abundance profiles of these microbial communities. These snapshots, when collected from multiple samples, have the potential to reveal which microbes co-occur, providing a glimpse into the network of associations in these communities. The inference of networks from 16S data is prone to statistical artifacts. There are many tools for performing each step of the 16S analysis workflow, but the extent to which these steps affect the final network is still unclear. In this study, we perform a meticulous analysis of each step of a pipeline that can convert 16S sequencing data into a network of microbial associations. Through this process, we map how different choices of algorithms and parameters affect the co-occurrence network and estimate steps that contribute most significantly to the variance. We further determine the tools and parameters that generate the most accurate and robust co-occurrence networks based on comparison with mock and synthetic datasets. Ultimately, we develop a standardized pipeline (available at https://github.com/segrelab/MiCoNE) that follows these default tools and parameters, but that can also help explore the outcome of any other combination of choices. We envisage that this pipeline could be used for integrating multiple data-sets, and for generating comparative analyses and consensus networks that can help understand and control microbial community assembly in different biomes.ImportanceTo understand and control the mechanisms that determine the structure and function of microbial communities, it is important to map the interrelationships between its constituent microbial species. The surge in the high-throughput sequencing of microbial communities has led to the creation of thousands of datasets containing information about microbial abundances. These abundances can be transformed into networks of co-occurrences across multiple samples, providing a glimpse into the structure of microbiomes. However, processing these datasets to obtain co-occurrence information relies on several complex steps, each of which involves multiple choices of tools and corresponding parameters. These multiple options pose questions about the accuracy and uniqueness of the inferred networks. In this study, we address this workflow and provide a systematic analysis of how these choices of tools and parameters affect the final network, and on how to select those that are most appropriate for a particular dataset.

Download Full-text

NetCoMi: Network Construction and Comparison for Microbiome Data in R

10.1101/2020.07.15.195248 ◽

2020 ◽

Cited By ~ 1

Author(s):

Stefanie Peschel ◽

Christian L. Müller ◽

Erika von Mutius ◽

Anne-Laure Boulesteix ◽

Martin Depner

Keyword(s):

High Throughput ◽

High Throughput Sequencing ◽

Natural Habitat ◽

Secondary Analysis ◽

Microbial Interactions ◽

Sequencing Data ◽

Sample Collection ◽

Microbial Association ◽

High Throughput Sequencing Data ◽

Microbial Associations

AbstractEstimating microbial association networks from high-throughput sequencing data is a common exploratory data analysis approach aiming at understanding the complex interplay of microbial communities in their natural habitat. Statistical network estimation workflows comprise several analysis steps, including methods for zero handling, data normalization, and computing microbial associations. Since microbial interactions are likely to change between conditions, e.g. between healthy individuals and patients, identifying network differences between groups is often an integral secondary analysis step. Thus far, however, no unifying computational tool is available that facilitates the whole analysis workflow of constructing, analyzing, and comparing microbial association networks from high-throughput sequencing data.Here, we introduce NetCoMi (Network Construction and comparison for Microbiome data), an R package that integrates existing methods for each analysis step in a single reproducible computational workflow. The package offers functionality for constructing and analyzing single microbial association networks as well as quantifying network differences. This enables insights into whether single taxa, groups of taxa, or the overall network structure change between groups. NetCoMi also contains functionality for constructing differential networks, thus allowing to assess whether single pairs of taxa are differentially associated between two groups. Furthermore, NetCoMi facilitates the construction and analysis of dissimilarity networks of microbiome samples, enabling a high-level graphical summary of the heterogeneity of an entire microbiome sample collection. We illustrate NetCoMi’s wide applicability using data sets from the GABRIELA study to compare microbial associations in settled dust from children’s rooms between samples from two study centers (Ulm and Munich).AvailabilityA script with R code used for producing the examples shown in this manuscript are provided as Supplementary data. The NetCoMi package, together with a tutorial, is available at https://github.com/stefpeschel/NetCoMi.

Download Full-text

NetCoMi: network construction and comparison for microbiome data in R

Briefings in Bioinformatics ◽

10.1093/bib/bbaa290 ◽

2020 ◽

Author(s):

Stefanie Peschel ◽

Christian L Müller ◽

Erika von Mutius ◽

Anne-Laure Boulesteix ◽

Martin Depner

Keyword(s):

High Throughput ◽

High Throughput Sequencing ◽

Supplementary Information ◽

Supplementary Data ◽

Sequencing Data ◽

Network Construction ◽

Microbial Association ◽

High Throughput Sequencing Data ◽

Microbial Associations ◽

Microbiome Data

Abstract Motivation Estimating microbial association networks from high-throughput sequencing data is a common exploratory data analysis approach aiming at understanding the complex interplay of microbial communities in their natural habitat. Statistical network estimation workflows comprise several analysis steps, including methods for zero handling, data normalization and computing microbial associations. Since microbial interactions are likely to change between conditions, e.g. between healthy individuals and patients, identifying network differences between groups is often an integral secondary analysis step. Thus far, however, no unifying computational tool is available that facilitates the whole analysis workflow of constructing, analysing and comparing microbial association networks from high-throughput sequencing data. Results Here, we introduce NetCoMi (Network Construction and comparison for Microbiome data), an R package that integrates existing methods for each analysis step in a single reproducible computational workflow. The package offers functionality for constructing and analysing single microbial association networks as well as quantifying network differences. This enables insights into whether single taxa, groups of taxa or the overall network structure change between groups. NetCoMi also contains functionality for constructing differential networks, thus allowing to assess whether single pairs of taxa are differentially associated between two groups. Furthermore, NetCoMi facilitates the construction and analysis of dissimilarity networks of microbiome samples, enabling a high-level graphical summary of the heterogeneity of an entire microbiome sample collection. We illustrate NetCoMi’s wide applicability using data sets from the GABRIELA study to compare microbial associations in settled dust from children’s rooms between samples from two study centers (Ulm and Munich). Availability R scripts used for producing the examples shown in this manuscript are provided as supplementary data. The NetCoMi package, together with a tutorial, is available at https://github.com/stefpeschel/NetCoMi. Contact Tel:+49 89 3187 43258; [email protected] Supplementary information Supplementary data are available at Briefings in Bioinformatics online.

Download Full-text

Metabolic Modeling of CommonEscherichia coliStrains in Human Gut Microbiome

BioMed Research International ◽

10.1155/2014/694967 ◽

2014 ◽

Vol 2014 ◽

pp. 1-11 ◽

Cited By ~ 7

Author(s):

Yue-Dong Gao ◽

Yuqi Zhao ◽

Jingfei Huang

Keyword(s):

Gut Microbiome ◽

Metabolic Networks ◽

High Throughput Sequencing ◽

Carbon Sources ◽

Microbial Interactions ◽

Dynamic Responses ◽

Sequencing Data ◽

Human Gut ◽

E Coli ◽

Human Gut Microbiome

The recent high-throughput sequencing has enabled the composition ofEscherichia colistrains in the human microbial community to be profiled en masse. However, there are two challenges to address: (1) exploring the genetic differences betweenE. colistrains in human gut and (2) dynamic responses ofE. colito diverse stress conditions. As a result, we investigated theE. colistrains in human gut microbiome using deep sequencing data and reconstructed genome-wide metabolic networks for the three most commonE. colistrains, includingE. coliHS, UTI89, and CFT073. The metabolic models show obvious strain-specific characteristics, both in network contents and in behaviors. We predicted optimal biomass production for three models on four different carbon sources (acetate, ethanol, glucose, and succinate) and found that these stress-associated genes were involved in host-microbial interactions and increased in human obesity. Besides, it shows that the growth rates are similar among the models, but the flux distributions are different, even inE. colicore reactions. The correlations between human diabetes-associated metabolic reactions in theE. colimodels were also predicted. The study provides a systems perspective onE. colistrains in human gut microbiome and will be helpful in integrating diverse data sources in the following study.

Download Full-text

Using null models to infer microbial co-occurrence networks

10.1101/070789 ◽

2016 ◽

Author(s):

Nora Connor ◽

Albert Barberán ◽

Aaron Clauset

Keyword(s):

Microbial Communities ◽

Microbial Community Composition ◽

Keystone Species ◽

Operational Taxonomic Unit ◽

Soil Samples ◽

Null Models ◽

Soil Microbiome ◽

Data Set ◽

Microbiome Data ◽

Statistical Noise

AbstractAlthough microbial communities are ubiquitous in nature, relatively little is known about the structural and functional roles of their constituent organisms’ underlying interactions. A common approach to study such questions begins with extracting a network of statistically significant pairwise co-occurrences from a matrix of observed operational taxonomic unit (OTU) abundances across sites. The structure of this network is assumed to encode information about ecological interactions and processes, resistance to perturbation, and the identity of keystone species. However, common methods for identifying these pairwise interactions can contaminate the network with spurious patterns that obscure true ecological signals. Here, we describe this problem in detail and develop a solution that incorporates null models to distinguish ecological signals from statistical noise. We apply these methods to the initial OTU abundance matrix and to the extracted network. We demonstrate this approach by applying it to a large soil microbiome data set and show that many previously reported patterns for these data are statistical artifacts. In contrast, we find the frequency of three-way interactions among microbial OTUs to be highly statistically significant. These results demonstrate the importance of using appropriate null models when studying observational microbiome data, and suggest that extracting and characterizing three-way interactions among OTUs is a promising direction for unraveling the structure and function of microbial ecosystems.Author SummaryMicrobes are ubiquitous in the environment. We know that microbial communities – the groups of microbes that live together, interact, and depend on one another – vary across environments. Multiple processes, ranging from competition between microbes to environmental stress, are believed to alter microbial community composition. Here, we describe a set of statistical techniques that can more accurately identify the underlying taxa relationships that structure the observed abundances of microbes across habitats. Using a large data set of soil samples collected across North and South America, we both illustrate the statistical artifacts that incorrect methods can introduce and describe proper techniques based on appropriate null models for studying how the abundances of taxa vary across soil samples. These tools improve our ability to distinguish ecologically meaningful interactions from simple statistical noise in such observational data. Our application of these tools suggests some previous claims about the network structure of microbial communities may be statistical artifacts. Furthermore, we find that three-way interactions among microbial taxa are significantly more common than we would expect at random, and thus may provide a novel means for identifying ecologically meaningful interactions.

Download Full-text