Parallel-META: A high-performance computational pipeline for metagenomic data analysis

Parallel-META: efficient metagenomic data analysis based on high-performance computation

BMC Systems Biology ◽

10.1186/1752-0509-6-s1-s16 ◽

2012 ◽

Vol 6 (Suppl 1) ◽

pp. S16 ◽

Cited By ~ 21

Author(s):

Xiaoquan Su ◽

Jian Xu ◽

Kang Ning

Keyword(s):

Data Analysis ◽

High Performance ◽

Metagenomic Data ◽

High Performance Computation

Download Full-text

Parallel-META 2.0: Enhanced Metagenomic Data Analysis with Functional Annotation, High Performance Computing and Advanced Visualization

PLoS ONE ◽

10.1371/journal.pone.0089323 ◽

2014 ◽

Vol 9 (3) ◽

pp. e89323 ◽

Cited By ~ 44

Author(s):

Xiaoquan Su ◽

Weihua Pan ◽

Baoxing Song ◽

Jian Xu ◽

Kang Ning

Keyword(s):

Data Analysis ◽

High Performance Computing ◽

Functional Annotation ◽

High Performance ◽

Metagenomic Data ◽

Performance Computing ◽

Advanced Visualization

Download Full-text

MetaBinner: a high-performance and stand-alone ensemble binning method to recover individual genomes from complex microbial communities

10.1101/2021.07.25.453671 ◽

2021 ◽

Author(s):

Ziye Wang ◽

Pingqin Huang ◽

Ronghui You ◽

Fengzhu Sun ◽

Shanfeng Zhu

Keyword(s):

Data Analysis ◽

High Performance ◽

Large Scale ◽

State Of The Art ◽

Single Copy ◽

Biological Information ◽

Metagenomic Data ◽

Single Copy Gene ◽

Ensemble Strategy ◽

Copy Gene

Binning is an essential procedure during metagenomic data analysis. However, the available individual binning methods usually do not simultaneously fully use different features or biological information. Furthermore, it is challenging to integrate multiple binning results efficiently and effectively. Therefore, we developed an ensemble binner, MetaBinner, which generates component results with multiple types of features and utilizes single-copy gene (SCG) information for k-means initialization. It then utilizes a two-step ensemble strategy based on SCGs to integrate the component results. Extensive experimental results over three large-scale simulated datasets and one real-world dataset demonstrate that MetaBinner outperforms other state-of-the-art individual binners and ensemble binners. MetaBinner is freely available at https://github.com/ziyewang/MetaBinner.

Download Full-text

High-Bandwidth Tactical-Network Data Analysis in a High-Performance-Computing (HPC) Environment: Transport Protocol (Transmission Control Protocol/User Datagram Protocol [TCP/UDP]) Analysis

10.21236/ada621268 ◽

2015 ◽

Author(s):

Kenneth D. Renard ◽

James R. Adametz

Keyword(s):

Data Analysis ◽

High Performance Computing ◽

High Performance ◽

Transmission Control Protocol ◽

Network Data ◽

Transport Protocol ◽

Transmission Control ◽

Control Protocol ◽

High Bandwidth ◽

Performance Computing

Download Full-text

High-Bandwidth Tactical-Network Data Analysis in a High-Performance-Computing (HPC) Environment: Device Status Data

10.21236/ada626790 ◽

2015 ◽

Author(s):

Brian Panneton ◽

Brendan Tauras ◽

Christopher Wancowicz ◽

Sean Coyne

Keyword(s):

Data Analysis ◽

High Performance Computing ◽

High Performance ◽

Network Data ◽

High Bandwidth ◽

Status Data ◽

Performance Computing

Download Full-text

Clin-mNGS: Automated Pipeline for Pathogen Detection from Clinical Metagenomic Data

Current Bioinformatics ◽

10.2174/1574893615999200608130029 ◽

2020 ◽

Vol 15 ◽

Author(s):

Akshatha Prasanna ◽

Vidya Niranjan

Keyword(s):

Antimicrobial Resistance ◽

High Performance ◽

Pathogen Detection ◽

Bacterial Species ◽

Workflow Management ◽

Metagenomic Data ◽

Antimicrobial Resistance Genes ◽

Culture Independent ◽

Automated Pipeline ◽

User Friendly

Background: Since bacteria are the earliest known organisms, there has been significant interest in their variety and biology, most certainly concerning human health. Recent advances in Metagenomics sequencing (mNGS), a culture-independent sequencing technology have facilitated an accelerated development in clinical microbiology and our understanding of pathogens. Objective: For the implementation of mNGS in routine clinical practice to become feasible, a practical and scalable strategy for the study of mNGS data is essential. This study presents a robust automated pipeline to analyze clinical metagenomic data for pathogen identification and classification. Method: The proposed Clin-mNGS pipeline is an integrated, open-source, scalable, reproducible, and user-friendly framework scripted using the Snakemake workflow management software. The implementation avoids the hassle of manual installation and configuration of the multiple command-line tools and dependencies. The approach directly screens pathogens from clinical raw reads and generates consolidated reports for each sample. Results: The pipeline is demonstrated using publicly available data and is tested on a desktop Linux system and a High-performance cluster. The study compares variability in results from different tools and versions. The versions of the tools are made user modifiable. The pipeline results in quality check, filtered reads, host subtraction, assembled contigs, assembly metrics, relative abundances of bacterial species, antimicrobial resistance genes, plasmid finding, and virulence factors identification. The results obtained from the pipeline are evaluated based on sensitivity and positive predictive value. Conclusion: Clin-mNGS is an automated Snakemake pipeline validated for the analysis of microbial clinical metagenomics reads to perform taxonomic classification and antimicrobial resistance prediction.

Download Full-text

Automatic data analysis workflow for ultra-high performance liquid chromatography-high resolution mass spectrometry-based metabolomics

Journal of Chromatography A ◽

10.1016/j.chroma.2018.11.070 ◽

2019 ◽

Vol 1585 ◽

pp. 172-181 ◽

Cited By ~ 3

Author(s):

Yong-Jie Yu ◽

Qing-Xia Zheng ◽

Yue-Ming Zhang ◽

Qian Zhang ◽

Yu-Ying Zhang ◽

...

Keyword(s):

Mass Spectrometry ◽

High Performance Liquid Chromatography ◽

Liquid Chromatography ◽

Data Analysis ◽

High Performance ◽

High Resolution Mass Spectrometry ◽

Automatic Data ◽

Analysis Workflow ◽

Automatic Data Analysis ◽

Resolution Mass

Download Full-text

Accelerating Large-Scale Data Analysis by Offloading to High-Performance Computing Libraries using Alchemist

Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining ◽

10.1145/3219819.3219927 ◽

2018 ◽

Cited By ~ 2

Author(s):

Alex Gittens ◽

Kai Rothauge ◽

Shusen Wang ◽

Michael W. Mahoney ◽

Lisa Gerhardt ◽

...

Keyword(s):

Data Analysis ◽

High Performance Computing ◽

High Performance ◽

Large Scale ◽

Large Scale Data ◽

Performance Computing ◽

Scale Data

Download Full-text

META-pipe cloud setup and execution

F1000Research ◽

10.12688/f1000research.13204.1 ◽

2017 ◽

Vol 6 ◽

pp. 2060

Author(s):

Aleksandr Agafonov ◽

Kimmo Mattila ◽

Cuong Duong Tuan ◽

Lars Tiede ◽

Inge Alexander Raknes ◽

...

Keyword(s):

Functional Annotation ◽

High Performance ◽

Sequence Data ◽

Metagenomic Data ◽

Taxonomic Profiling ◽

Geographically Distributed ◽

Computationally Intensive ◽

High Performance Computing Cluster ◽

And Storage ◽

Performance Computing

META-pipe is a complete service for the analysis of marine metagenomic data. It provides assembly of high-throughput sequence data, functional annotation of predicted genes, and taxonomic profiling. The functional annotation is computationally demanding and is therefore currently run on a high-performance computing cluster in Norway. However, additional compute resources are necessary to open the service to all ELIXIR users. We describe our approach for setting up and executing the functional analysis of META-pipe on additional academic and commercial clouds. Our goal is to provide a powerful analysis service that is easy to use and to maintain. Our design therefore uses a distributed architecture where we combine central servers with multiple distributed backends that execute the computationally intensive jobs. We believe our experiences developing and operating META-pipe provides a useful model for others that plan to provide a portal based data analysis service in ELIXIR and other organizations with geographically distributed compute and storage resources.

Download Full-text

Taxonomic assignment for large-scale metagenomic data on high-perfomance systems

Journal of Computer Science and Cybernetics ◽

10.15625/1813-9663/33/2/10753 ◽

2017 ◽

Vol 33 (2) ◽

pp. 119-130

Author(s):

Vinh Van Le ◽

Hoai Van Tran ◽

Hieu Ngoc Duong ◽

Giang Xuan Bui ◽

Lang Van Tran

Keyword(s):

High Performance Computing ◽

Assignment Problem ◽

High Performance ◽

Large Scale ◽

Computing System ◽

Metagenomic Data ◽

Taxonomic Assignment ◽

High Performance Computing System ◽

Powerful Approach ◽

Performance Computing

Metagenomics is a powerful approach to study environment samples which do not require the isolation and cultivation of individual organisms. One of the essential tasks in a metagenomic project is to identify the origin of reads, referred to as taxonomic assignment. Due to the fact that each metagenomic project has to analyze large-scale datasets, the metatenomic assignment is very much computation intensive. This study proposes a parallel algorithm for the taxonomic assignment problem, called SeMetaPL, which aims to deal with the computational challenge. The proposed algorithm is evaluated with both simulated and real datasets on a high performance computing system. Experimental results demonstrate that the algorithm is able to achieve good performance and utilize resources of the system efficiently. The software implementing the algorithm and all test datasets can be downloaded at http://it.hcmute.edu.vn/bioinfo/metapro/SeMetaPL.html.

Download Full-text