Parallel-META: A high-performance computational pipeline for metagenomic data analysis

Author(s):  
Xiaoquan Su ◽  
Jian Xu ◽  
Kang Ning
2021 ◽  
Author(s):  
Ziye Wang ◽  
Pingqin Huang ◽  
Ronghui You ◽  
Fengzhu Sun ◽  
Shanfeng Zhu

Binning is an essential procedure during metagenomic data analysis. However, the available individual binning methods usually do not simultaneously fully use different features or biological information. Furthermore, it is challenging to integrate multiple binning results efficiently and effectively. Therefore, we developed an ensemble binner, MetaBinner, which generates component results with multiple types of features and utilizes single-copy gene (SCG) information for k-means initialization. It then utilizes a two-step ensemble strategy based on SCGs to integrate the component results. Extensive experimental results over three large-scale simulated datasets and one real-world dataset demonstrate that MetaBinner outperforms other state-of-the-art individual binners and ensemble binners. MetaBinner is freely available at https://github.com/ziyewang/MetaBinner.


2020 ◽  
Vol 15 ◽  
Author(s):  
Akshatha Prasanna ◽  
Vidya Niranjan

Background: Since bacteria are the earliest known organisms, there has been significant interest in their variety and biology, most certainly concerning human health. Recent advances in Metagenomics sequencing (mNGS), a culture-independent sequencing technology have facilitated an accelerated development in clinical microbiology and our understanding of pathogens. Objective: For the implementation of mNGS in routine clinical practice to become feasible, a practical and scalable strategy for the study of mNGS data is essential. This study presents a robust automated pipeline to analyze clinical metagenomic data for pathogen identification and classification. Method: The proposed Clin-mNGS pipeline is an integrated, open-source, scalable, reproducible, and user-friendly framework scripted using the Snakemake workflow management software. The implementation avoids the hassle of manual installation and configuration of the multiple command-line tools and dependencies. The approach directly screens pathogens from clinical raw reads and generates consolidated reports for each sample. Results: The pipeline is demonstrated using publicly available data and is tested on a desktop Linux system and a High-performance cluster. The study compares variability in results from different tools and versions. The versions of the tools are made user modifiable. The pipeline results in quality check, filtered reads, host subtraction, assembled contigs, assembly metrics, relative abundances of bacterial species, antimicrobial resistance genes, plasmid finding, and virulence factors identification. The results obtained from the pipeline are evaluated based on sensitivity and positive predictive value. Conclusion: Clin-mNGS is an automated Snakemake pipeline validated for the analysis of microbial clinical metagenomics reads to perform taxonomic classification and antimicrobial resistance prediction.


F1000Research ◽  
2017 ◽  
Vol 6 ◽  
pp. 2060
Author(s):  
Aleksandr Agafonov ◽  
Kimmo Mattila ◽  
Cuong Duong Tuan ◽  
Lars Tiede ◽  
Inge Alexander Raknes ◽  
...  

META-pipe is a complete service for the analysis of marine metagenomic data. It provides assembly of high-throughput sequence data, functional annotation of predicted genes, and taxonomic profiling. The functional annotation is computationally demanding and is therefore currently run on a high-performance computing cluster in Norway. However, additional compute resources are necessary to open the service to all ELIXIR users. We describe our approach for setting up and executing the functional analysis of META-pipe on additional academic and commercial clouds. Our goal is to provide a powerful analysis service that is easy to use and to maintain. Our design therefore uses a distributed architecture where we combine central servers with multiple distributed backends that execute the computationally intensive jobs. We believe our experiences developing and operating META-pipe provides a useful model for others that plan to provide a portal based data analysis service in ELIXIR and other organizations with geographically distributed compute and storage resources.


2017 ◽  
Vol 33 (2) ◽  
pp. 119-130
Author(s):  
Vinh Van Le ◽  
Hoai Van Tran ◽  
Hieu Ngoc Duong ◽  
Giang Xuan Bui ◽  
Lang Van Tran

Metagenomics is a powerful approach to study environment samples which do not require the isolation and cultivation of individual organisms. One of the essential tasks in a metagenomic project is to identify the origin of reads, referred to as taxonomic assignment. Due to the fact that each metagenomic project has to analyze large-scale datasets, the metatenomic assignment is very much computation intensive. This study proposes a parallel algorithm for the taxonomic assignment problem, called SeMetaPL, which aims to deal with the computational challenge. The proposed algorithm is evaluated with both simulated and real datasets on a high performance computing system. Experimental results demonstrate that the algorithm is able to achieve good performance and utilize resources of the system efficiently. The software implementing the algorithm and all test datasets can be downloaded at http://it.hcmute.edu.vn/bioinfo/metapro/SeMetaPL.html.


Sign in / Sign up

Export Citation Format

Share Document