Metaviz: interactive statistical and visual analysis of metagenomic data

Mapping Intimacies ◽

10.1101/105205 ◽

2017 ◽

Cited By ~ 2

Author(s):

Justin Wagner ◽

Florin Chelaru ◽

Jayaram Kancherla ◽

Joseph N. Paulson ◽

Victor Felix ◽

...

Keyword(s):

Visual Analysis ◽

Alpha Diversity ◽

Feature Space ◽

Metagenomic Data ◽

Metagenomic Sequencing ◽

Graph Database ◽

Web Browser ◽

Survey Techniques ◽

Scatter Plots ◽

Taxonomic Annotation

AbstractAlong with the survey techniques of 16S rRNA amplicon and whole-metagenome shotgun sequencing, an array of tools exists for clustering, taxonomic annotation, normalization, and statistical analysis of microbiome sequencing results. Integrative and interactive visualization that enables researchers to perform exploratory analysis in this feature rich hierarchical data is an area of need. In this work, we present Metaviz, a web browser-based tool for interactive exploratory metagenomic data analysis. Metaviz can visualize abundance data served from an R session or a Python web service that queries a graph database. As metagenomic sequencing features have a hierarchy, we designed a novel navigation mechanism to explore this feature space. We visualize abundance counts with heatmaps and stacked bar plots that are dynamically updated as a user selects taxonomic features to inspect. Metaviz also supports common data exploration techniques, including PCA scatter plots to interpret variability in the dataset and alpha diversity boxplots for examining ecological community composition. The Metaviz application and documentation is hosted at http://www.metaviz.org.

Download Full-text

Efficient computation of Faith's phylogenetic diversity with applications in characterizing microbiomes

Genome Research ◽

10.1101/gr.275777.121 ◽

2021 ◽

pp. gr.275777.121

Author(s):

George W Armstrong ◽

Kalen Cantrell ◽

Shi Huang ◽

Daniel McDonald ◽

Niina Haiminen ◽

...

Keyword(s):

Carbon Footprint ◽

Phylogenetic Diversity ◽

Alpha Diversity ◽

Previous Method ◽

Metagenomic Data ◽

Efficient Computation ◽

Computationally Efficient ◽

Dataset Size ◽

Computational Resources ◽

Older Populations

The number of publicly available microbiome samples is continually growing. As dataset size increases, bottlenecks arise in standard analytical pipelines. Faith’s phylogenetic diversity is a highly utilized phylogenetic alpha diversity metric that has thus far failed to effectively scale to trees with millions of vertices. Stacked Faith's Phylogenetic Diversity (SFPhD) enables calculation of this widely adopted diversity metric at a much larger scale by implementing a computationally efficient algorithm. The algorithm reduces the amount of computational resources required, resulting in more accessible software with a reduced carbon footprint, as compared to previous approaches. The new algorithm produces identical results to the previous method. We further demonstrate that the phylogenetic aspect of Faith's PD provides increased power in detecting diversity differences between younger and older populations in the FINRISK study's metagenomic data.

Download Full-text

Evaluation of the CosmosID Bioinformatics Platform for Prosthetic Joint-Associated Sonicate Fluid Shotgun Metagenomic Data Analysis

Journal of Clinical Microbiology ◽

10.1128/jcm.01182-18 ◽

2018 ◽

Vol 57 (2) ◽

Cited By ~ 8

Author(s):

Qun Yan ◽

Yu Mi Wi ◽

Matthew J. Thoendel ◽

Yash S. Raval ◽

Kerryl E. Greenwood-Quaintance ◽

...

Keyword(s):

Antibiotic Resistance ◽

Metagenomic Data ◽

Metagenomic Sequencing ◽

Antibacterial Resistance ◽

Sequencing Data ◽

Bacterial Detection ◽

Shotgun Metagenomic Sequencing ◽

Prosthetic Joint ◽

Validation Set ◽

Fluid Culture

ABSTRACT We previously demonstrated that shotgun metagenomic sequencing can detect bacteria in sonicate fluid, providing a diagnosis of prosthetic joint infection (PJI). A limitation of the approach that we used is that data analysis was time-consuming and specialized bioinformatics expertise was required, both of which are barriers to routine clinical use. Fortunately, automated commercial analytic platforms that can interpret shotgun metagenomic data are emerging. In this study, we evaluated the CosmosID bioinformatics platform using shotgun metagenomic sequencing data derived from 408 sonicate fluid samples from our prior study with the goal of evaluating the platform vis-à-vis bacterial detection and antibiotic resistance gene detection for predicting staphylococcal antibacterial susceptibility. Samples were divided into a derivation set and a validation set, each consisting of 204 samples; results from the derivation set were used to establish cutoffs, which were then tested in the validation set for identifying pathogens and predicting staphylococcal antibacterial resistance. Metagenomic analysis detected bacteria in 94.8% (109/115) of sonicate fluid culture-positive PJIs and 37.8% (37/98) of sonicate fluid culture-negative PJIs. Metagenomic analysis showed sensitivities ranging from 65.7 to 85.0% for predicting staphylococcal antibacterial resistance. In conclusion, the CosmosID platform has the potential to provide fast, reliable bacterial detection and identification from metagenomic shotgun sequencing data derived from sonicate fluid for the diagnosis of PJI. Strategies for metagenomic detection of antibiotic resistance genes for predicting staphylococcal antibacterial resistance need further development.

Download Full-text

MetaDEGalaxy: Galaxy workflow for differential abundance analysis of 16s metagenomic data

F1000Research ◽

10.12688/f1000research.18866.2 ◽

2019 ◽

Vol 8 ◽

pp. 726

Author(s):

Mike W.C. Thang ◽

Xin-Yi Chua ◽

Gareth Price ◽

Dominique Gorse ◽

Matt A. Field

Keyword(s):

Microbial Communities ◽

Sequence Data ◽

Metagenomic Data ◽

Marker Genes ◽

Metagenomic Sequencing ◽

Differential Analysis ◽

Biomedical Sciences ◽

Metagenomic Sequence ◽

Differential Abundance ◽

Differential Abundance Analysis

Metagenomic sequencing is an increasingly common tool in environmental and biomedical sciences. While software for detailing the composition of microbial communities using 16S rRNA marker genes is relatively mature, increasingly researchers are interested in identifying changes exhibited within microbial communities under differing environmental conditions. In order to gain maximum value from metagenomic sequence data we must improve the existing analysis environment by providing accessible and scalable computational workflows able to generate reproducible results. Here we describe a complete end-to-end open-source metagenomics workflow running within Galaxy for 16S differential abundance analysis. The workflow accepts 454 or Illumina sequence data (either overlapping or non-overlapping paired end reads) and outputs lists of the operational taxonomic unit (OTUs) exhibiting the greatest change under differing conditions. A range of analysis steps and graphing options are available giving users a high-level of control over their data and analyses. Additionally, users are able to input complex sample-specific metadata information which can be incorporated into differential analysis and used for grouping / colouring within graphs. Detailed tutorials containing sample data and existing workflows are available for three different input types: overlapping and non-overlapping read pairs as well as for pre-generated Biological Observation Matrix (BIOM) files. Using the Galaxy platform we developed MetaDEGalaxy, a complete metagenomics differential abundance analysis workflow. MetaDEGalaxy is designed for bench scientists working with 16S data who are interested in comparative metagenomics. MetaDEGalaxy builds on momentum within the wider Galaxy metagenomics community with the hope that more tools will be added as existing methods mature.

Download Full-text

Towards end-to-end disease prediction from raw metagenomic data

10.1101/2020.10.29.360297 ◽

2020 ◽

Author(s):

Maxence Queyrel ◽

Edi Prifti ◽

Jean-Daniel Zucker

Keyword(s):

Dna Sequences ◽

Real Life ◽

Multiple Instance Learning ◽

Disease Classification ◽

Metagenomic Data ◽

Numerical Representation ◽

Metagenomic Sequencing ◽

Sequencing Data ◽

End To End ◽

Bioinformatics Workflows

AbstractAnalysis of the human microbiome using metagenomic sequencing data has demonstrated high ability in discriminating various human diseases. Raw metagenomic sequencing data require multiple complex and computationally heavy bioinformatics steps prior to data analysis. Such data contain millions of short sequences read from the fragmented DNA sequences and are stored as fastq files. Conventional processing pipelines consist multiple steps including quality control, filtering, alignment of sequences against genomic catalogs (genes, species, taxonomic levels, functional pathways, etc.). These pipelines are complex to use, time consuming and rely on a large number of parameters that often provide variability and impact the estimation of the microbiome elements. Recent studies have demonstrated that training Deep Neural Networks directly from raw sequencing data is a promising approach to bypass some of the challenges associated with mainstream bioinformatics pipelines. Most of these methods use the concept of word and sentence embeddings that create a meaningful and numerical representation of DNA sequences, while extracting features and reducing the dimentionality of the data. In this paper we present an end-to-end approach that classifies patients into disease groups directly from raw metagenomic reads: metagenome2vec. This approach is composed of four steps (i) generating a vocabulary of k-mers and learning their numerical embeddings; (ii) learning DNA sequence (read) embeddings; (iii) identifying the genome from which the sequence is most likely to come and (iv) training a multiple instance learning classifier which predicts the phenotype based on the vector representation of the raw data. An attention mechanism is applied in the network so that the model can be interpreted, assigning a weight to the influence of the prediction for each genome. Using two public real-life datasets as well a simulated one, we demonstrated that this original approach reached very high performances, comparable with the state-of-the-art methods applied directly on processed data though mainstream bioinformatics workflows. These results are encouraging for this proof of concept work. We believe that with further dedication, the DNN models have the potential to surpass mainstream bioinformatics workflows in disease classification tasks.

Download Full-text

Soil microbes and oxygen influence the changes in bacterial community composition during swine carcass decomposition

10.21203/rs.3.rs-211559/v1 ◽

2021 ◽

Author(s):

Michelle Miguel ◽

Seon Ho Kim ◽

Sang Suk Lee ◽

Yong Il Cho

Keyword(s):

Bacterial Community ◽

16S Rrna ◽

Community Composition ◽

Soil Microbes ◽

Close Association ◽

Bacterial Community Composition ◽

Alpha Diversity ◽

Decomposition Process ◽

Metagenomic Sequencing ◽

Unsterilized Soil

Abstract Background Carcass decomposition is influenced by various factors such as temperature, humidity, microorganisms, invertebrates, and scavengers. Soil microbes play a significant role in the decomposition process. In this study, we investigated the changes in the bacterial community during carcass decomposition in soil with an intact microbial community and soil which was sterilized decomposed with and without oxygen access using 16s rRNA metagenomic sequencing. Results Based on the 16S rRNA metagenomic sequencing, a total of 988 operational taxonomic units (OTUs) representing 16 phyla and 533 genera were detected. The bacterial diversity varied across the based on the alpha diversity indices. The bacterial composition in the unsterilized soil – aerobic condition (U_A) and unsterilized soil – anaerobic condition (U_An) set-ups have higher alpha diversity than the other burial set-ups. Beta diversity analysis revealed a close association in the samples according to the burial type and decomposition day. Firmicutes was the dominant phylum across all samples regardless of the burial type and decomposition day. The bacterial community composition changed throughout the decomposition process in all burial set-up. Meanwhile, the genus Bacillus dominated the bacterial community towards the end of decomposition period. Conclusions Our results showed that bacterial community composition changed during carcass decomposition and was affected by the soil and oxygen access, with microorganisms belonging to phylum Firmicutes dominating the community.

Download Full-text

On the Applicability of Speaker Diarization to Audio Indexing of Non-Speech and Mixed Non-Speech/Speech Video Soundtracks

International Journal of Multimedia Data Engineering and Management ◽

10.4018/jmdem.2012070101 ◽

2012 ◽

Vol 3 (3) ◽

pp. 1-19 ◽

Cited By ~ 1

Author(s):

Robert Mertens ◽

Po-Sen Huang ◽

Luke Gottlieb ◽

Gerald Friedland ◽

Ajay Divakaran ◽

...

Keyword(s):

Visual Analysis ◽

Feature Space ◽

Acoustic Properties ◽

Speaker Diarization ◽

Concept System ◽

Depth Analysis ◽

Audio Indexing ◽

Highly Correlated ◽

Definition Of ◽

Video Concept Detection

A video’s soundtrack is usually highly correlated to its content. Hence, audio-based techniques have recently emerged as a means for video concept detection complementary to visual analysis. Most state-of-the-art approaches rely on manual definition of predefined sound concepts such as “ngine sounds,” “utdoor/indoor sounds.” These approaches come with three major drawbacks: manual definitions do not scale as they are highly domain-dependent, manual definitions are highly subjective with respect to annotators and a large part of the audio content is omitted since the predefined concepts are usually found only in a fraction of the soundtrack. This paper explores how unsupervised audio segmentation systems like speaker diarization can be adapted to automatically identify low-level sound concepts similar to annotator defined concepts and how these concepts can be used for audio indexing. Speaker diarization systems are designed to answer the question “ho spoke when?”by finding segments in an audio stream that exhibit similar properties in feature space, i.e., sound similar. Using a diarization system, all the content of an audio file is analyzed and similar sounds are clustered. This article provides an in-depth analysis on the statistic properties of similar acoustic segments identified by the diarization system in a predefined document set and the theoretical fitness of this approach to discern one document class from another. It also discusses how diarization can be tuned in order to better reflect the acoustic properties of general sounds as opposed to speech and introduces a proof-of-concept system for multimedia event classification working with diarization-based indexing.

Download Full-text

Improvement of eukaryotic proteins prediction from soil metagenomes

10.1101/2021.11.10.468086 ◽

2021 ◽

Author(s):

Carole Belliardo ◽

Georgios Koutsovoulos ◽

Corinne Rancurel ◽

Mathilde Clement ◽

Justine Lipuma ◽

...

Keyword(s):

Automated Analysis ◽

Metagenomic Data ◽

Analysis Pipeline ◽

Shotgun Metagenomics ◽

Assignment Method ◽

Eukaryotic Proteins ◽

Public Repositories ◽

Soil Proteins ◽

Taxonomic Annotation

Background | During the last decades, shotgun metagenomics and metabarcoding have highlighted the diversity of microorganisms from environmental or host-associated samples. Most assembled metagenome public repositories use annotation pipelines tailored for prokaryotes regardless of the taxonomic origin of contigs and metagenome-assembled genomes (MAGs). Consequently, eukaryotic contigs and MAGs, with intrinsically different gene features, are not optimally annotated, resulting in an incorrect representation of the eukaryotic component of biodiversity, despite their biological relevance. Results | Using an automated analysis pipeline, we have filtered eukaryotic contigs from 6,873 soil metagenomes from the IMG/M database of the Joint Genome Institute. We have re-annotated genes using eukaryote-tailored methods, yielding 5,6 million eukaryotic proteins. Our pipeline improves eukaryotic proteins completeness, contiguity and quality. Moreover, the better quality of eukaryotic proteins combined with a more comprehensive assignment method improves the taxonomic annotation as well. Conclusions | Using public soil metagenomic data, we provide a dataset of eukaryotic soil proteins with improved completeness and quality as well as a more reliable taxonomic annotation. This unique resource is of interest for any scientist aiming at studying the composition, biological functions and gene flux in soil communities involving eukaryotes.

Download Full-text

Metagenomic Signatures of Bacterial Adaptation to Life in the Phyllosphere of a Salt-Secreting Desert Tree

Applied and Environmental Microbiology ◽

10.1128/aem.00483-16 ◽

2016 ◽

Vol 82 (9) ◽

pp. 2854-2861 ◽

Cited By ~ 24

Author(s):

Omri M. Finkel ◽

Tom O. Delmont ◽

Anton F. Post ◽

Shimshon Belkin

Keyword(s):

High Salinity ◽

Stress Factors ◽

Metagenomic Data ◽

Metagenomic Sequencing ◽

Contig Assembly ◽

Bacterial Populations ◽

Content Type ◽

Light Sensing ◽

Wide Range ◽

Globally Distributed

ABSTRACTThe leaves ofTamarix aphylla, a globally distributed, salt-secreting desert tree, are dotted with alkaline droplets of high salinity. To successfully inhabit these organic carbon-rich droplets, bacteria need to be adapted to multiple stress factors, including high salinity, high alkalinity, high UV radiation, and periodic desiccation. To identify genes that are important for survival in this harsh habitat, microbial community DNA was extracted from the leaf surfaces of 10Tamarix aphyllatrees along a 350-km longitudinal gradient. Shotgun metagenomic sequencing, contig assembly, and binning yielded 17 genome bins, six of which were >80% complete. These genomic bins, representing three phyla (Proteobacteria,Bacteroidetes, andFirmicutes), were closely related to halophilic and alkaliphilic taxa isolated from aquatic and soil environments. Comparison of these genomic bins to the genomes of their closest relatives revealed functional traits characteristic of bacterial populations inhabiting theTamarixphyllosphere, independent of their taxonomic affiliation. These functions, most notably light-sensing genes, are postulated to represent important adaptations toward colonization of this habitat.IMPORTANCEPlant leaves are an extensive and diverse microbial habitat, forming the main interface between solar energy and the terrestrial biosphere. There are hundreds of thousands of plant species in the world, exhibiting a wide range of morphologies, leaf surface chemistries, and ecological ranges. In order to understand the core adaptations of microorganisms to this habitat, it is important to diversify the type of leaves that are studied. This study provides an analysis of the genomic content of the most abundant bacterial inhabitants of the globally distributed, salt-secreting desert treeTamarix aphylla. Draft genomes of these bacteria were assembled, using the culture-independent technique of assembly and binning of metagenomic data. Analysis of the genomes reveals traits that are important for survival in this habitat, most notably, light-sensing and light utilization genes.

Download Full-text

Metaviz: interactive statistical and visual analysis of metagenomic data

Nucleic Acids Research ◽

10.1093/nar/gky136 ◽

2018 ◽

Vol 46 (6) ◽

pp. 2777-2787 ◽

Cited By ~ 13

Author(s):

Justin Wagner ◽

Florin Chelaru ◽

Jayaram Kancherla ◽

Joseph N Paulson ◽

Alexander Zhang ◽

...

Keyword(s):

Visual Analysis ◽

Metagenomic Data

Download Full-text

Evaluating Metagenomic Prediction of the Metaproteome in a 4.5-Year Study of a Patient with Crohn's Disease

mSystems ◽

10.1128/msystems.00337-18 ◽

2019 ◽

Vol 4 (1) ◽

Cited By ~ 18

Author(s):

Robert H. Mills ◽

Yoshiki Vázquez-Baeza ◽

Qiyun Zhu ◽

Lingjing Jiang ◽

James Gaffney ◽

...

Keyword(s):

Crohn’S Disease ◽

Crohn's Disease ◽

Dna Analysis ◽

Gene Copy Number ◽

Metagenomic Data ◽

Gene Copy ◽

Metagenomic Sequencing ◽

Data Types ◽

Fecal Microbiome ◽

Disease States

ABSTRACT Although genetic approaches are the standard in microbiome analysis, proteome-level information is largely absent. This discrepancy warrants a better understanding of the relationship between gene copy number and protein abundance, as this is crucial information for inferring protein-level changes from metagenomic data. As it remains unknown how metaproteomic systems evolve during dynamic disease states, we leveraged a 4.5-year fecal time series using samples from a single patient with colonic Crohn’s disease. Utilizing multiplexed quantitative proteomics and shotgun metagenomic sequencing of eight time points in technical triplicate, we quantified over 29,000 protein groups and 110,000 genes and compared them to five protein biomarkers of disease activity. Broad-scale observations were consistent between data types, including overall clustering by principal-coordinate analysis and fluctuations in Gene Ontology terms related to Crohn’s disease. Through linear regression, we determined genes and proteins fluctuating in conjunction with inflammatory metrics. We discovered conserved taxonomic differences relevant to Crohn’s disease, including a negative association of Faecalibacterium and a positive association of Escherichia with calprotectin. Despite concordant associations of genera, the specific genes correlated with these metrics were drastically different between metagenomic and metaproteomic data sets. This resulted in the generation of unique functional interpretations dependent on the data type, with metaproteome evidence for previously investigated mechanisms of dysbiosis. An example of one such mechanism was a connection between urease enzymes, amino acid metabolism, and the local inflammation state within the patient. This proof-of-concept approach prompts further investigation of the metaproteome and its relationship with the metagenome in biologically complex systems such as the microbiome. IMPORTANCE A majority of current microbiome research relies heavily on DNA analysis. However, as the field moves toward understanding the microbial functions related to healthy and disease states, it is critical to evaluate how changes in DNA relate to changes in proteins, which are functional units of the genome. This study tracked the abundance of genes and proteins as they fluctuated during various inflammatory states in a 4.5-year study of a patient with colonic Crohn’s disease. Our results indicate that despite a low level of correlation, taxonomic associations were consistent in the two data types. While there was overlap of the data types, several associations were uniquely discovered by analyzing the metaproteome component. This case study provides unique and important insights into the fundamental relationship between the genes and proteins of a single individual’s fecal microbiome associated with clinical consequences.

Download Full-text