MetaWRAP - a flexible pipeline for genome-resolved metagenomic data analysis

Mapping Intimacies ◽

10.1101/277442 ◽

2018 ◽

Cited By ~ 3

Author(s):

Gherman V Uritskiy ◽

Jocelyne DiRuggiero ◽

James Taylor

Keyword(s):

Data Analysis ◽

Shotgun Sequencing ◽

Metagenomic Analysis ◽

Metagenomic Data ◽

Sequencing Data ◽

High Quality ◽

Single Genome ◽

Significant Burden ◽

Versatile Tool ◽

Genome Level

AbstractBackground:The study of microbiomes using whole-metagenome shotgun sequencing enables the analysis of uncultivated microbial populations that may have important roles in their environments. Extracting individual draft genomes (bins) facilitates metagenomic analysis at the single genome level. Software and pipelines for such analysis have become diverse and sophisticated, resulting in a significant burden for biologists to access and use them. Furthermore, while bin extraction algorithms are rapidly improving, there is still a lack of tools for their evaluation and visualization.Results:To address these challenges, we present metaWRAP, a modular pipeline software for shotgun metagenomic data analysis. MetaWRAP deploys state-of-the-art software to handle metagenomic data processing starting from raw sequencing reads and ending in metagenomic bins and their analysis. MetaWRAP is flexible enough to give investigators control over the analysis, while still being easy-to-install and easy-to-use. It includes hybrid algorithms that leverage the strengths of a variety of software to extract and refine high-quality bins from metagenomic data through bin consolidation and reassembly. MetaWRAP’s hybrid bin extraction algorithm outperforms individual binning approaches and other bin consolidation programs in both synthetic and real datasets. Finally, metaWRAP comes with numerous modules for the analysis of metagenomic bins, including taxonomy assignment, abundance estimation, functional annotation, and visualization.Conclusions:MetaWRAP is an easy-to-use modular pipeline that automates the core tasks in metagenomic analysis, while contributing significant improvements to the extraction and interpretation of high-quality metagenomic bins. The bin refinement and reassembly modules of metaWRAP consistently outperform other binning approaches. Each module of metaWRAP is also a standalone component, making it a flexible and versatile tool for tackling metagenomic shotgun sequencing data. MetaWRAP is open-source software available at https://github.com/bxlab/metaWRAP.

Download Full-text

Method for Bisulfite Sequencing Data Analysis for Whole-Genome Level DNA Methylation Detection in Legumes

Legume Genomics - Methods in Molecular Biology ◽

10.1007/978-1-0716-0235-5_6 ◽

2020 ◽

pp. 127-145

Author(s):

Khushboo Gupta ◽

Rohini Garg

Keyword(s):

Dna Methylation ◽

Data Analysis ◽

Bisulfite Sequencing ◽

Whole Genome ◽

Sequencing Data ◽

Bisulfite Sequencing Data ◽

Genome Level ◽

Sequencing Data Analysis

Download Full-text

Accurate and sensitive detection of microbial eukaryotes from whole metagenome shotgun sequencing

Microbiome ◽

10.1186/s40168-021-01015-y ◽

2021 ◽

Vol 9 (1) ◽

Author(s):

Abigail L. Lind ◽

Katherine S. Pollard

Keyword(s):

Gene Families ◽

Shotgun Sequencing ◽

Metagenomic Data ◽

Marker Genes ◽

Metagenomic Sequencing ◽

Sequencing Data ◽

Dna And Rna ◽

Paired Samples ◽

Microbial Eukaryotes ◽

Conserved Gene

Abstract Background Microbial eukaryotes are found alongside bacteria and archaea in natural microbial systems, including host-associated microbiomes. While microbial eukaryotes are critical to these communities, they are challenging to study with shotgun sequencing techniques and are therefore often excluded. Results Here, we present EukDetect, a bioinformatics method to identify eukaryotes in shotgun metagenomic sequencing data. Our approach uses a database of 521,824 universal marker genes from 241 conserved gene families, which we curated from 3713 fungal, protist, non-vertebrate metazoan, and non-streptophyte archaeplastida genomes and transcriptomes. EukDetect has a broad taxonomic coverage of microbial eukaryotes, performs well on low-abundance and closely related species, and is resilient against bacterial contamination in eukaryotic genomes. Using EukDetect, we describe the spatial distribution of eukaryotes along the human gastrointestinal tract, showing that fungi and protists are present in the lumen and mucosa throughout the large intestine. We discover that there is a succession of eukaryotes that colonize the human gut during the first years of life, mirroring patterns of developmental succession observed in gut bacteria. By comparing DNA and RNA sequencing of paired samples from human stool, we find that many eukaryotes continue active transcription after passage through the gut, though some do not, suggesting they are dormant or nonviable. We analyze metagenomic data from the Baltic Sea and find that eukaryotes differ across locations and salinity gradients. Finally, we observe eukaryotes in Arabidopsis leaf samples, many of which are not identifiable from public protein databases. Conclusions EukDetect provides an automated and reliable way to characterize eukaryotes in shotgun sequencing datasets from diverse microbiomes. We demonstrate that it enables discoveries that would be missed or clouded by false positives with standard shotgun sequence analysis. EukDetect will greatly advance our understanding of how microbial eukaryotes contribute to microbiomes.

Download Full-text

HOME-BIO (sHOtgun MEtagenomic analysis of BIOlogical entities): a specific and comprehensive pipeline for metagenomic shotgun sequencing data analysis

BMC Bioinformatics ◽

10.1186/s12859-021-04004-y ◽

2021 ◽

Vol 22 (S7) ◽

Author(s):

Carlo Ferravante ◽

Domenico Memoli ◽

Domenico Palumbo ◽

Paolo Ciaramella ◽

Antonio Di Loria ◽

...

Keyword(s):

Data Analysis ◽

Low Complexity ◽

Metagenomic Analysis ◽

Sequencing Data ◽

Metagenomics Data ◽

Speed Up ◽

Inclusive Analysis ◽

Biological Entities ◽

Next Generation Sequencing Ngs ◽

User Friendly

Abstract Background Next-Generation-Sequencing (NGS) enables detection of microorganisms present in biological and other matrices of various origin and nature, allowing not only the identification of known phyla and strains but also the discovery of novel ones. The large amount of metagenomic shotgun data produced by NGS require comprehensive and user-friendly pipelines for data analysis, that speed up the bioinformatics steps, relieving the users from the need to manually perform complex and time-consuming tasks. Results We describe here HOME-BIO (sHOtgun MEtagenomic analysis of BIOlogical entities), an exhaustive pipeline for metagenomics data analysis, comprising three independent analytical modules designed for an inclusive analysis of large NGS datasets. Conclusions HOME-BIO is a powerful and easy-to-use tool that can be run also by users with limited computational expertise. It allows in-depth analyses by removing low-complexity/ problematic reads, integrating the analytical steps that lead to a comprehensive taxonomy profile of each sample by querying different source databases, and it is customizable according to specific users’ needs.

Download Full-text

MG-MLST: Characterizing the Microbiome at the Strain Level in Metagenomic Data

Microorganisms ◽

10.3390/microorganisms8050684 ◽

2020 ◽

Vol 8 (5) ◽

pp. 684

Author(s):

Nathanael J. Bangayan ◽

Baochen Shi ◽

Jerry Trinh ◽

Emma Barnard ◽

Gabriela Kasimatis ◽

...

Keyword(s):

High Throughput Sequencing ◽

Human Microbiome ◽

Shotgun Sequencing ◽

Strain Level ◽

Multi Locus Sequence Typing ◽

Metagenomic Data ◽

Sequencing Analysis ◽

Metagenomic Sequencing ◽

Healthy Skin ◽

Sequencing Data

The microbiome plays an important role in human physiology. The composition of the human microbiome has been described at the phylum, class, genus, and species levels, however, it is largely unknown at the strain level. The importance of strain-level differences in microbial communities has been increasingly recognized in understanding disease associations. Current methods for identifying strain populations often require deep metagenomic sequencing and a comprehensive set of reference genomes. In this study, we developed a method, metagenomic multi-locus sequence typing (MG-MLST), to determine strain-level composition in a microbial community by combining high-throughput sequencing with multi-locus sequence typing (MLST). We used a commensal bacterium, Propionibacterium acnes, as an example to test the ability of MG-MLST in identifying the strain composition. Using simulated communities, MG-MLST accurately predicted the strain populations in all samples. We further validated the method using MLST gene amplicon libraries and metagenomic shotgun sequencing data of clinical skin samples. MG-MLST yielded consistent results of the strain composition to those obtained from nearly full-length 16S rRNA clone libraries and metagenomic shotgun sequencing analysis. When comparing strain-level differences between acne and healthy skin microbiomes, we demonstrated that strains of RT2/6 were highly associated with healthy skin, consistent with previous findings. In summary, MG-MLST provides a quantitative analysis of the strain populations in the microbiome with diversity and richness. It can be applied to microbiome studies to reveal strain-level differences between groups, which are critical in many microorganism-related diseases.

Download Full-text

Comparison of Three Commercial Tools for Metagenomic Shotgun Sequencing Analysis

Journal of Clinical Microbiology ◽

10.1128/jcm.00981-19 ◽

2019 ◽

Vol 58 (3) ◽

Cited By ~ 1

Author(s):

Matthew Thoendel ◽

Patricio Jeraldo ◽

Kerryl E. Greenwood-Quaintance ◽

Janet Yao ◽

Nicholas Chia ◽

...

Keyword(s):

Diagnostic Method ◽

Shotgun Sequencing ◽

Metagenomic Analysis ◽

Data Sets ◽

Sequencing Analysis ◽

Sequencing Data ◽

Data Set ◽

Significant Challenge ◽

Number Of Species

ABSTRACT Metagenomic shotgun sequencing for the identification of pathogens is being increasingly utilized as a diagnostic method. Interpretation of large and complicated data sets is a significant challenge, for which multiple commercial tools have been developed. Three commercial metagenomic shotgun sequencing tools, CosmosID, One Codex, and IDbyDNA, were compared to determine whether they result in similar interpretations of the same sequencing data. We selected 24 diverse samples from a previously characterized data set derived from DNA extracted from biofilms dislodged from the surfaces of resected arthroplasties (sonicate fluid). Sequencing data sets were analyzed using the three commercial tools and compared to culture results and prior metagenomic analysis interpretation. Identical interpretations from all three tools occurred for 6 samples. The total number of species identified included 28 by CosmosID, 59 by One Codex, and 41 by IDbyDNA. All of the tools performed similarly in detecting those microorganisms identified by culture, including polymicrobial mixes. These data show that while all of the tools performed well overall, there were some differences, particularly in their predilection for identifying low-abundance or contaminant organisms as present.

Download Full-text

Genomes of Wolbachia endosymbionts from the human filarial parasites Mansonella perstans and Mansonella ozzardi

10.1101/2021.03.23.436630 ◽

2021 ◽

Author(s):

Amit Sinha ◽

Zhiru Li ◽

Catherine B Poole ◽

Laurence Ettwiller ◽

Nathália F Lima ◽

...

Keyword(s):

Data Analysis ◽

Metagenomic Data ◽

Clinical Samples ◽

Valuable Resource ◽

Genome Sequences ◽

High Quality ◽

Metagenome Assembly ◽

Mansonella Perstans ◽

Symbiosis Evolution ◽

Alpha Proteobacteria

Mansonella ozzardi and Mansonella perstans, filarial parasites infecting millions of people worldwide, harbor their unique obligate endosymbionts, the alpha-proteobacteria WolbachiawMoz and wMpe, respectively. Currently, little is known about these Wolbachia and no genome sequences are available. In the current study, high quality draft genomes of wMoz and wMpe were assembled from complex clinical samples using a metagenome assembly and binning approach. These represent the first genomes from supergroup F Wolbachia originating from human parasites and share features characteristic of filarial as well arthropod Wolbachia, consistent with their position in supergroup F. Metagenomic data analysis was also used to estimate Wolbachia titers, which revealed wide variation in levels across different clinical isolates, addressing the contradicting reports on presence or absence of Wolbachia in M. perstans. These findings may have implications for the use antibiotics to treat mansonellosis. The wMoz and wMpe genome sequences provide a valuable resource for further studies on symbiosis, evolution and drug discovery.

Download Full-text

Advancing clinical genomics and precision medicine with GVViZ: FAIR bioinformatics platform for variable gene-disease annotation, visualization, and expression analysis

Human Genomics ◽

10.1186/s40246-021-00336-1 ◽

2021 ◽

Vol 15 (1) ◽

Author(s):

Zeeshan Ahmed ◽

Eduard Gibert Renart ◽

Saman Zeeshan ◽

XinQi Dong

Keyword(s):

Data Analysis ◽

Patient Care ◽

Expression Analysis ◽

High Throughput ◽

Gene Annotation ◽

Next Generation Sequencing Data ◽

Rna Seq ◽

Sequencing Data ◽

Complex Disorders ◽

Transcriptomics Data

Abstract Background Genetic disposition is considered critical for identifying subjects at high risk for disease development. Investigating disease-causing and high and low expressed genes can support finding the root causes of uncertainties in patient care. However, independent and timely high-throughput next-generation sequencing data analysis is still a challenge for non-computational biologists and geneticists. Results In this manuscript, we present a findable, accessible, interactive, and reusable (FAIR) bioinformatics platform, i.e., GVViZ (visualizing genes with disease-causing variants). GVViZ is a user-friendly, cross-platform, and database application for RNA-seq-driven variable and complex gene-disease data annotation and expression analysis with a dynamic heat map visualization. GVViZ has the potential to find patterns across millions of features and extract actionable information, which can support the early detection of complex disorders and the development of new therapies for personalized patient care. The execution of GVViZ is based on a set of simple instructions that users without a computational background can follow to design and perform customized data analysis. It can assimilate patients’ transcriptomics data with the public, proprietary, and our in-house developed gene-disease databases to query, easily explore, and access information on gene annotation and classified disease phenotypes with greater visibility and customization. To test its performance and understand the clinical and scientific impact of GVViZ, we present GVViZ analysis for different chronic diseases and conditions, including Alzheimer’s disease, arthritis, asthma, diabetes mellitus, heart failure, hypertension, obesity, osteoporosis, and multiple cancer disorders. The results are visualized using GVViZ and can be exported as image (PNF/TIFF) and text (CSV) files that include gene names, Ensembl (ENSG) IDs, quantified abundances, expressed transcript lengths, and annotated oncology and non-oncology diseases. Conclusions We emphasize that automated and interactive visualization should be an indispensable component of modern RNA-seq analysis, which is currently not the case. However, experts in clinics and researchers in life sciences can use GVViZ to visualize and interpret the transcriptomics data, making it a powerful tool to study the dynamics of gene expression and regulation. Furthermore, with successful deployment in clinical settings, GVViZ has the potential to enable high-throughput correlations between patient diagnoses based on clinical and transcriptomics data.

Download Full-text

Connecting structure to function with the recovery of over 1000 high-quality metagenome-assembled genomes from activated sludge using long-read sequencing

Nature Communications ◽

10.1038/s41467-021-22203-2 ◽

2021 ◽

Vol 12 (1) ◽

Cited By ~ 2

Author(s):

Caitlin M. Singleton ◽

Francesca Petriglieri ◽

Jannie M. Kristensen ◽

Rasmus H. Kirkegaard ◽

Thomas Y. Michaelsen ◽

...

Keyword(s):

16S Rrna ◽

Wastewater Treatment Plants ◽

In Situ Hybridisation ◽

Amplicon Sequencing ◽

Rrna Genes ◽

Fluorescence In Situ Hybridisation ◽

Sequencing Data ◽

High Quality ◽

16S Rrna Amplicon Sequencing ◽

Long Read

AbstractMicroorganisms play crucial roles in water recycling, pollution removal and resource recovery in the wastewater industry. The structure of these microbial communities is increasingly understood based on 16S rRNA amplicon sequencing data. However, such data cannot be linked to functional potential in the absence of high-quality metagenome-assembled genomes (MAGs) for nearly all species. Here, we use long-read and short-read sequencing to recover 1083 high-quality MAGs, including 57 closed circular genomes, from 23 Danish full-scale wastewater treatment plants. The MAGs account for ~30% of the community based on relative abundance, and meet the stringent MIMAG high-quality draft requirements including full-length rRNA genes. We use the information provided by these MAGs in combination with >13 years of 16S rRNA amplicon sequencing data, as well as Raman microspectroscopy and fluorescence in situ hybridisation, to uncover abundant undescribed lineages belonging to important functional groups.

Download Full-text

LncGSEA: a versatile tool to infer lncRNA associated pathways from large-scale cancer transcriptome sequencing data

BMC Genomics ◽

10.1186/s12864-021-07900-y ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Yanan Ren ◽

Ting-You Wang ◽

Leah C. Anderton ◽

Qi Cao ◽

Rendong Yang

Keyword(s):

Gene Expression ◽

Large Scale ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Clinical Samples ◽

Sequencing Data ◽

Multiple Cancer ◽

Regulatory Pathways ◽

Cancer Transcriptome ◽

Versatile Tool

Abstract Background Long non-coding RNAs (lncRNAs) are a growing focus in cancer research. Deciphering pathways influenced by lncRNAs is important to understand their role in cancer. Although knock-down or overexpression of lncRNAs followed by gene expression profiling in cancer cell lines are established approaches to address this problem, these experimental data are not available for a majority of the annotated lncRNAs. Results As a surrogate, we present lncGSEA, a convenient tool to predict the lncRNA associated pathways through Gene Set Enrichment Analysis of gene expression profiles from large-scale cancer patient samples. We demonstrate that lncGSEA is able to recapitulate lncRNA associated pathways supported by literature and experimental validations in multiple cancer types. Conclusions LncGSEA allows researchers to infer lncRNA regulatory pathways directly from clinical samples in oncology. LncGSEA is written in R, and is freely accessible at https://github.com/ylab-hi/lncGSEA.

Download Full-text

Parallel-META: efficient metagenomic data analysis based on high-performance computation

BMC Systems Biology ◽

10.1186/1752-0509-6-s1-s16 ◽

2012 ◽

Vol 6 (Suppl 1) ◽

pp. S16 ◽

Cited By ~ 21

Author(s):

Xiaoquan Su ◽

Jian Xu ◽

Kang Ning

Keyword(s):

Data Analysis ◽

High Performance ◽

Metagenomic Data ◽

High Performance Computation

Download Full-text