scholarly journals MetaWRAP - a flexible pipeline for genome-resolved metagenomic data analysis

2018 ◽  
Author(s):  
Gherman V Uritskiy ◽  
Jocelyne DiRuggiero ◽  
James Taylor

AbstractBackground:The study of microbiomes using whole-metagenome shotgun sequencing enables the analysis of uncultivated microbial populations that may have important roles in their environments. Extracting individual draft genomes (bins) facilitates metagenomic analysis at the single genome level. Software and pipelines for such analysis have become diverse and sophisticated, resulting in a significant burden for biologists to access and use them. Furthermore, while bin extraction algorithms are rapidly improving, there is still a lack of tools for their evaluation and visualization.Results:To address these challenges, we present metaWRAP, a modular pipeline software for shotgun metagenomic data analysis. MetaWRAP deploys state-of-the-art software to handle metagenomic data processing starting from raw sequencing reads and ending in metagenomic bins and their analysis. MetaWRAP is flexible enough to give investigators control over the analysis, while still being easy-to-install and easy-to-use. It includes hybrid algorithms that leverage the strengths of a variety of software to extract and refine high-quality bins from metagenomic data through bin consolidation and reassembly. MetaWRAP’s hybrid bin extraction algorithm outperforms individual binning approaches and other bin consolidation programs in both synthetic and real datasets. Finally, metaWRAP comes with numerous modules for the analysis of metagenomic bins, including taxonomy assignment, abundance estimation, functional annotation, and visualization.Conclusions:MetaWRAP is an easy-to-use modular pipeline that automates the core tasks in metagenomic analysis, while contributing significant improvements to the extraction and interpretation of high-quality metagenomic bins. The bin refinement and reassembly modules of metaWRAP consistently outperform other binning approaches. Each module of metaWRAP is also a standalone component, making it a flexible and versatile tool for tackling metagenomic shotgun sequencing data. MetaWRAP is open-source software available at https://github.com/bxlab/metaWRAP.

Microbiome ◽  
2021 ◽  
Vol 9 (1) ◽  
Author(s):  
Abigail L. Lind ◽  
Katherine S. Pollard

Abstract Background Microbial eukaryotes are found alongside bacteria and archaea in natural microbial systems, including host-associated microbiomes. While microbial eukaryotes are critical to these communities, they are challenging to study with shotgun sequencing techniques and are therefore often excluded. Results Here, we present EukDetect, a bioinformatics method to identify eukaryotes in shotgun metagenomic sequencing data. Our approach uses a database of 521,824 universal marker genes from 241 conserved gene families, which we curated from 3713 fungal, protist, non-vertebrate metazoan, and non-streptophyte archaeplastida genomes and transcriptomes. EukDetect has a broad taxonomic coverage of microbial eukaryotes, performs well on low-abundance and closely related species, and is resilient against bacterial contamination in eukaryotic genomes. Using EukDetect, we describe the spatial distribution of eukaryotes along the human gastrointestinal tract, showing that fungi and protists are present in the lumen and mucosa throughout the large intestine. We discover that there is a succession of eukaryotes that colonize the human gut during the first years of life, mirroring patterns of developmental succession observed in gut bacteria. By comparing DNA and RNA sequencing of paired samples from human stool, we find that many eukaryotes continue active transcription after passage through the gut, though some do not, suggesting they are dormant or nonviable. We analyze metagenomic data from the Baltic Sea and find that eukaryotes differ across locations and salinity gradients. Finally, we observe eukaryotes in Arabidopsis leaf samples, many of which are not identifiable from public protein databases. Conclusions EukDetect provides an automated and reliable way to characterize eukaryotes in shotgun sequencing datasets from diverse microbiomes. We demonstrate that it enables discoveries that would be missed or clouded by false positives with standard shotgun sequence analysis. EukDetect will greatly advance our understanding of how microbial eukaryotes contribute to microbiomes.


2021 ◽  
Vol 22 (S7) ◽  
Author(s):  
Carlo Ferravante ◽  
Domenico Memoli ◽  
Domenico Palumbo ◽  
Paolo Ciaramella ◽  
Antonio Di Loria ◽  
...  

Abstract Background Next-Generation-Sequencing (NGS) enables detection of microorganisms present in biological and other matrices of various origin and nature, allowing not only the identification of known phyla and strains but also the discovery of novel ones. The large amount of metagenomic shotgun data produced by NGS require comprehensive and user-friendly pipelines for data analysis, that speed up the bioinformatics steps, relieving the users from the need to manually perform complex and time-consuming tasks. Results We describe here HOME-BIO (sHOtgun MEtagenomic analysis of BIOlogical entities), an exhaustive pipeline for metagenomics data analysis, comprising three independent analytical modules designed for an inclusive analysis of large NGS datasets. Conclusions HOME-BIO is a powerful and easy-to-use tool that can be run also by users with limited computational expertise. It allows in-depth analyses by removing low-complexity/ problematic reads, integrating the analytical steps that lead to a comprehensive taxonomy profile of each sample by querying different source databases, and it is customizable according to specific users’ needs.


2020 ◽  
Vol 8 (5) ◽  
pp. 684
Author(s):  
Nathanael J. Bangayan ◽  
Baochen Shi ◽  
Jerry Trinh ◽  
Emma Barnard ◽  
Gabriela Kasimatis ◽  
...  

The microbiome plays an important role in human physiology. The composition of the human microbiome has been described at the phylum, class, genus, and species levels, however, it is largely unknown at the strain level. The importance of strain-level differences in microbial communities has been increasingly recognized in understanding disease associations. Current methods for identifying strain populations often require deep metagenomic sequencing and a comprehensive set of reference genomes. In this study, we developed a method, metagenomic multi-locus sequence typing (MG-MLST), to determine strain-level composition in a microbial community by combining high-throughput sequencing with multi-locus sequence typing (MLST). We used a commensal bacterium, Propionibacterium acnes, as an example to test the ability of MG-MLST in identifying the strain composition. Using simulated communities, MG-MLST accurately predicted the strain populations in all samples. We further validated the method using MLST gene amplicon libraries and metagenomic shotgun sequencing data of clinical skin samples. MG-MLST yielded consistent results of the strain composition to those obtained from nearly full-length 16S rRNA clone libraries and metagenomic shotgun sequencing analysis. When comparing strain-level differences between acne and healthy skin microbiomes, we demonstrated that strains of RT2/6 were highly associated with healthy skin, consistent with previous findings. In summary, MG-MLST provides a quantitative analysis of the strain populations in the microbiome with diversity and richness. It can be applied to microbiome studies to reveal strain-level differences between groups, which are critical in many microorganism-related diseases.


2019 ◽  
Vol 58 (3) ◽  
Author(s):  
Matthew Thoendel ◽  
Patricio Jeraldo ◽  
Kerryl E. Greenwood-Quaintance ◽  
Janet Yao ◽  
Nicholas Chia ◽  
...  

ABSTRACT Metagenomic shotgun sequencing for the identification of pathogens is being increasingly utilized as a diagnostic method. Interpretation of large and complicated data sets is a significant challenge, for which multiple commercial tools have been developed. Three commercial metagenomic shotgun sequencing tools, CosmosID, One Codex, and IDbyDNA, were compared to determine whether they result in similar interpretations of the same sequencing data. We selected 24 diverse samples from a previously characterized data set derived from DNA extracted from biofilms dislodged from the surfaces of resected arthroplasties (sonicate fluid). Sequencing data sets were analyzed using the three commercial tools and compared to culture results and prior metagenomic analysis interpretation. Identical interpretations from all three tools occurred for 6 samples. The total number of species identified included 28 by CosmosID, 59 by One Codex, and 41 by IDbyDNA. All of the tools performed similarly in detecting those microorganisms identified by culture, including polymicrobial mixes. These data show that while all of the tools performed well overall, there were some differences, particularly in their predilection for identifying low-abundance or contaminant organisms as present.


2021 ◽  
Author(s):  
Amit Sinha ◽  
Zhiru Li ◽  
Catherine B Poole ◽  
Laurence Ettwiller ◽  
Nathália F Lima ◽  
...  

Mansonella ozzardi and Mansonella perstans, filarial parasites infecting millions of people worldwide, harbor their unique obligate endosymbionts, the alpha-proteobacteria WolbachiawMoz and wMpe, respectively. Currently, little is known about these Wolbachia and no genome sequences are available. In the current study, high quality draft genomes of wMoz and wMpe were assembled from complex clinical samples using a metagenome assembly and binning approach. These represent the first genomes from supergroup F Wolbachia originating from human parasites and share features characteristic of filarial as well arthropod Wolbachia, consistent with their position in supergroup F. Metagenomic data analysis was also used to estimate Wolbachia titers, which revealed wide variation in levels across different clinical isolates, addressing the contradicting reports on presence or absence of Wolbachia in M. perstans. These findings may have implications for the use antibiotics to treat mansonellosis. The wMoz and wMpe genome sequences provide a valuable resource for further studies on symbiosis, evolution and drug discovery.


2021 ◽  
Vol 15 (1) ◽  
Author(s):  
Zeeshan Ahmed ◽  
Eduard Gibert Renart ◽  
Saman Zeeshan ◽  
XinQi Dong

Abstract Background Genetic disposition is considered critical for identifying subjects at high risk for disease development. Investigating disease-causing and high and low expressed genes can support finding the root causes of uncertainties in patient care. However, independent and timely high-throughput next-generation sequencing data analysis is still a challenge for non-computational biologists and geneticists. Results In this manuscript, we present a findable, accessible, interactive, and reusable (FAIR) bioinformatics platform, i.e., GVViZ (visualizing genes with disease-causing variants). GVViZ is a user-friendly, cross-platform, and database application for RNA-seq-driven variable and complex gene-disease data annotation and expression analysis with a dynamic heat map visualization. GVViZ has the potential to find patterns across millions of features and extract actionable information, which can support the early detection of complex disorders and the development of new therapies for personalized patient care. The execution of GVViZ is based on a set of simple instructions that users without a computational background can follow to design and perform customized data analysis. It can assimilate patients’ transcriptomics data with the public, proprietary, and our in-house developed gene-disease databases to query, easily explore, and access information on gene annotation and classified disease phenotypes with greater visibility and customization. To test its performance and understand the clinical and scientific impact of GVViZ, we present GVViZ analysis for different chronic diseases and conditions, including Alzheimer’s disease, arthritis, asthma, diabetes mellitus, heart failure, hypertension, obesity, osteoporosis, and multiple cancer disorders. The results are visualized using GVViZ and can be exported as image (PNF/TIFF) and text (CSV) files that include gene names, Ensembl (ENSG) IDs, quantified abundances, expressed transcript lengths, and annotated oncology and non-oncology diseases. Conclusions We emphasize that automated and interactive visualization should be an indispensable component of modern RNA-seq analysis, which is currently not the case. However, experts in clinics and researchers in life sciences can use GVViZ to visualize and interpret the transcriptomics data, making it a powerful tool to study the dynamics of gene expression and regulation. Furthermore, with successful deployment in clinical settings, GVViZ has the potential to enable high-throughput correlations between patient diagnoses based on clinical and transcriptomics data.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Caitlin M. Singleton ◽  
Francesca Petriglieri ◽  
Jannie M. Kristensen ◽  
Rasmus H. Kirkegaard ◽  
Thomas Y. Michaelsen ◽  
...  

AbstractMicroorganisms play crucial roles in water recycling, pollution removal and resource recovery in the wastewater industry. The structure of these microbial communities is increasingly understood based on 16S rRNA amplicon sequencing data. However, such data cannot be linked to functional potential in the absence of high-quality metagenome-assembled genomes (MAGs) for nearly all species. Here, we use long-read and short-read sequencing to recover 1083 high-quality MAGs, including 57 closed circular genomes, from 23 Danish full-scale wastewater treatment plants. The MAGs account for ~30% of the community based on relative abundance, and meet the stringent MIMAG high-quality draft requirements including full-length rRNA genes. We use the information provided by these MAGs in combination with >13 years of 16S rRNA amplicon sequencing data, as well as Raman microspectroscopy and fluorescence in situ hybridisation, to uncover abundant undescribed lineages belonging to important functional groups.


BMC Genomics ◽  
2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Yanan Ren ◽  
Ting-You Wang ◽  
Leah C. Anderton ◽  
Qi Cao ◽  
Rendong Yang

Abstract Background Long non-coding RNAs (lncRNAs) are a growing focus in cancer research. Deciphering pathways influenced by lncRNAs is important to understand their role in cancer. Although knock-down or overexpression of lncRNAs followed by gene expression profiling in cancer cell lines are established approaches to address this problem, these experimental data are not available for a majority of the annotated lncRNAs. Results As a surrogate, we present lncGSEA, a convenient tool to predict the lncRNA associated pathways through Gene Set Enrichment Analysis of gene expression profiles from large-scale cancer patient samples. We demonstrate that lncGSEA is able to recapitulate lncRNA associated pathways supported by literature and experimental validations in multiple cancer types. Conclusions LncGSEA allows researchers to infer lncRNA regulatory pathways directly from clinical samples in oncology. LncGSEA is written in R, and is freely accessible at https://github.com/ylab-hi/lncGSEA.


Sign in / Sign up

Export Citation Format

Share Document