Interoperable and scalable data analysis with microservices: Applications in Metabolomics

Mapping Intimacies ◽

10.1101/213603 ◽

2017 ◽

Cited By ~ 2

Author(s):

Payam Emami Khoonsari ◽

Pablo Moreno ◽

Sven Bergmann ◽

Joachim Burman ◽

Marco Capuccini ◽

...

Keyword(s):

Mass Spectrometry ◽

Data Analysis ◽

Large Scale ◽

Metabolite Identification ◽

Access Point ◽

Scientific Discipline ◽

Resonance Spectroscopy ◽

Magnetic Resonance Spectroscopy Study ◽

Analysis Workflow ◽

Computational Resources

Developing a robust and performant data analysis workflow that integrates all necessary components whilst still being able to scale over multiple compute nodes is a challenging task. We introduce a generic method based on the microservice architecture, where software tools are encapsulated as Docker containers that can be connected into scientific workflows and executed in parallel using the Kubernetes container orchestrator. The access point is a virtual research environment which can be launched on-demand on cloud resources and desktop computers. IT-expertise requirements on the user side are kept to a minimum, and established workflows can be re-used effortlessly by any novice user. We validate our method in the field of metabolomics on two mass spectrometry studies, one nuclear magnetic resonance spectroscopy study and one fluxomics study, showing that the method scales dynamically with increasing availability of computational resources. We achieved a complete integration of the major software suites resulting in the first turn-key workflow encompassing all steps for mass-spectrometry-based metabolomics including preprocessing, multivariate statistics, and metabolite identification. Microservices is a generic methodology that can serve any scientific discipline and opens up for new types of large-scale integrative science.

Download Full-text

Interoperable and scalable data analysis with microservices: applications in metabolomics

Bioinformatics ◽

10.1093/bioinformatics/btz160 ◽

2019 ◽

Vol 35 (19) ◽

pp. 3752-3760 ◽

Cited By ~ 10

Author(s):

Payam Emami Khoonsari ◽

Pablo Moreno ◽

Sven Bergmann ◽

Joachim Burman ◽

Marco Capuccini ◽

...

Keyword(s):

Mass Spectrometry ◽

Data Analysis ◽

Large Scale ◽

Scientific Discipline ◽

Supplementary Information ◽

Resonance Spectroscopy ◽

Research Environment ◽

Metabolomics Data ◽

Analysis Workflow ◽

Virtual Research Environment

Abstract Motivation Developing a robust and performant data analysis workflow that integrates all necessary components whilst still being able to scale over multiple compute nodes is a challenging task. We introduce a generic method based on the microservice architecture, where software tools are encapsulated as Docker containers that can be connected into scientific workflows and executed using the Kubernetes container orchestrator. Results We developed a Virtual Research Environment (VRE) which facilitates rapid integration of new tools and developing scalable and interoperable workflows for performing metabolomics data analysis. The environment can be launched on-demand on cloud resources and desktop computers. IT-expertise requirements on the user side are kept to a minimum, and workflows can be re-used effortlessly by any novice user. We validate our method in the field of metabolomics on two mass spectrometry, one nuclear magnetic resonance spectroscopy and one fluxomics study. We showed that the method scales dynamically with increasing availability of computational resources. We demonstrated that the method facilitates interoperability using integration of the major software suites resulting in a turn-key workflow encompassing all steps for mass-spectrometry-based metabolomics including preprocessing, statistics and identification. Microservices is a generic methodology that can serve any scientific discipline and opens up for new types of large-scale integrative science. Availability and implementation The PhenoMeNal consortium maintains a web portal (https://portal.phenomenal-h2020.eu) providing a GUI for launching the Virtual Research Environment. The GitHub repository https://github.com/phnmnl/ hosts the source code of all projects. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Cloud-based DIA data analysis module for signal refinement improves accuracy and throughput of large datasets

10.1101/2021.07.14.452243 ◽

2021 ◽

Author(s):

Karen E. Christianson ◽

Jacob. D. Jaffe ◽

Steven A. Carr ◽

Alvaro Sebastian Vaca Jacome

Keyword(s):

Mass Spectrometry ◽

Data Analysis ◽

Large Scale ◽

Mass Spectrometry Data ◽

Avant Garde ◽

Data Independent Acquisition ◽

Large Scale Data ◽

Biological Insight ◽

Computational Resources ◽

User Friendly

AbstractData-independent acquisition (DIA) is a powerful mass spectrometry method that promises higher coverage, reproducibility, and throughput than traditional quantitative proteomics approaches. However, the complexity of DIA data caused by fragmentation of co-isolating peptides presents significant challenges for confident assignment of identity and quantity, information that is essential for deriving meaningful biological insight from the data. To overcome this problem, we previously developed Avant-garde, a tool for automated signal refinement of DIA and other targeted mass spectrometry data. AvG is designed to work alongside existing tools for peptide detection to address the reliability and quantitative suitability of signals extracted for the identified peptides. While its use is straightforward and offers efficient refinement for small datasets, the execution of AvG for large DIA datasets is time-consuming, especially if run with limited computational resources. To overcome these limitations, we present here an improved, cloud-based implementation of the AvG algorithm deployed on Terra, a user-friendly cloud-based platform for large-scale data analysis and sharing, as an accessible and standardized resource to the wider community.

Download Full-text

Automatic data analysis workflow for ultra-high performance liquid chromatography-high resolution mass spectrometry-based metabolomics

Journal of Chromatography A ◽

10.1016/j.chroma.2018.11.070 ◽

2019 ◽

Vol 1585 ◽

pp. 172-181 ◽

Cited By ~ 3

Author(s):

Yong-Jie Yu ◽

Qing-Xia Zheng ◽

Yue-Ming Zhang ◽

Qian Zhang ◽

Yu-Ying Zhang ◽

...

Keyword(s):

Mass Spectrometry ◽

High Performance Liquid Chromatography ◽

Liquid Chromatography ◽

Data Analysis ◽

High Performance ◽

High Resolution Mass Spectrometry ◽

Automatic Data ◽

Analysis Workflow ◽

Automatic Data Analysis ◽

Resolution Mass

Download Full-text

Integrating a generalized data analysis workflow with the Single-probe mass spectrometry experiment for single cell metabolomics

Analytica Chimica Acta ◽

10.1016/j.aca.2019.03.006 ◽

2019 ◽

Vol 1064 ◽

pp. 71-79 ◽

Cited By ~ 3

Author(s):

Renmeng Liu ◽

Genwei Zhang ◽

Mei Sun ◽

Xiaoliang Pan ◽

Zhibo Yang

Keyword(s):

Mass Spectrometry ◽

Data Analysis ◽

Single Cell ◽

Mass Spectrometry Experiment ◽

Single Probe ◽

Analysis Workflow ◽

Cell Metabolomics

Download Full-text

An automated proteomic data analysis workflow for mass spectrometry

BMC Bioinformatics ◽

10.1186/1471-2105-10-s11-s17 ◽

2009 ◽

Vol 10 (Suppl 11) ◽

pp. S17 ◽

Cited By ~ 14

Author(s):

Ken Pendarvis ◽

Ranjit Kumar ◽

Shane C Burgess ◽

Bindu Nanduri

Keyword(s):

Mass Spectrometry ◽

Data Analysis ◽

Proteomic Data ◽

Analysis Workflow

Download Full-text

Neuroscience Cloud Analysis As a Service

10.1101/2020.06.11.146746 ◽

2020 ◽

Cited By ~ 2

Author(s):

Taiga Abe ◽

Ian Kinsella ◽

Shreya Saxena ◽

Liam Paninski ◽

John P. Cunningham

Keyword(s):

Data Analysis ◽

Open Source ◽

Large Scale ◽

Ease Of Use ◽

Cutting Edge ◽

Analysis Tools ◽

Large Scale Computing ◽

Cloud Computation ◽

Computational Resources ◽

Cloud Analysis

AbstractA major goal of computational neuroscience is to develop powerful analysis tools that operate on large datasets. These methods provide an essential toolset to unlock scientific insights from new experiments. Unfortunately, a major obstacle currently impedes progress: while existing analysis methods are frequently shared as open source software, the infrastructure needed to deploy these methods – at scale, reproducibly, cheaply, and quickly – remains totally inaccessible to all but a minority of expert users. As a result, many users can not fully exploit these tools, due to constrained computational resources (limited or costly compute hardware) and/or mismatches in expertise (experimentalists vs. large-scale computing experts). In this work we develop Neuroscience Cloud Analysis As a Service (NeuroCAAS): a fully-managed infrastructure platform, based on modern large-scale computing advances, that makes state-of-the-art data analysis tools accessible to the neuroscience community. We offer NeuroCAAS as an open source service with a drag-and-drop interface, entirely removing the burden of infrastructure expertise, purchasing, maintenance, and deployment. NeuroCAAS is enabled by three key contributions. First, NeuroCAAS cleanly separates tool implementation from usage, allowing cutting-edge methods to be served directly to the end user with no need to read or install any analysis software. Second, NeuroCAAS automatically scales as needed, providing reliable, highly elastic computational resources that are more efficient than personal or lab-supported hardware, without management overhead. Finally, we show that many popular data analysis tools offered through NeuroCAAS outperform typical analysis solutions (in terms of speed and cost) while improving ease of use and maintenance, dispelling the myth that cloud compute is prohibitively expensive and technically inaccessible. By removing barriers to fast, efficient cloud computation, NeuroCAAS can dramatically accelerate both the dissemination and the effective use of cutting-edge analysis tools for neuroscientific discovery.

Download Full-text

Database supported candidate search for Metabolite identification

Journal of Integrative Bioinformatics ◽

10.1515/jib-2011-157 ◽

2011 ◽

Vol 8 (2) ◽

pp. 23-38 ◽

Cited By ~ 7

Author(s):

Christian Hildebrandt ◽

Sebastian Wolf ◽

Steffen Neumann

Keyword(s):

Mass Spectrometry ◽

Cross Validation ◽

Metabolite Identification ◽

Training Data ◽

Exact Mass ◽

Large Numbers ◽

In Silico Fragmentation ◽

The Masses ◽

Computational Resources ◽

Analytical Technology

Summary Mass spectrometry is an important analytical technology for the identification of metabolites and small compounds by their exact mass. But dozens or hundreds of different compounds may have a similar mass or even the same molecule formula. Further elucidation requires tandem mass spectrometry, which provides the masses of compound fragments, but in silico fragmentation programs require substantial computational resources if applied to large numbers of candidate structures.We present and evaluate an approach to obtain candidates from a relational database which contains 28 million compounds from PubChem.A training phase associates tandem-MS peaks with corresponding fragment structures. For the candidate search, the peaks in a query spectrum are translated to fragment structures, and the candidates are retrieved and sorted by the number of matching fragment structures. In the cross validation the evaluation of the relative ranking positions (RRP) using different sizes of training sets confirms that a larger coverage of training data improves the average RRP from 0.65 to 0.72. Our approach allows downstream algorithms to process candidates in order of importance.

Download Full-text

Cloud Computing for BioLabs

Cloud Technology ◽

10.4018/978-1-4666-6539-2.ch058 ◽

2015 ◽

pp. 1272-1293

Author(s):

Abraham Pouliakis ◽

Aris Spathis ◽

Christine Kottaridi ◽

Antonia Mourtzikou ◽

Marilena Stamouli ◽

...

Keyword(s):

Cloud Computing ◽

Data Analysis ◽

Drug Design ◽

Large Scale ◽

Future Research ◽

New Paradigm ◽

Computing Power ◽

The Everyday ◽

Potential Applications ◽

Computational Resources

Cloud computing has quickly emerged as an exciting new paradigm providing models of computing and services. Via cloud computing technology, bioinformatics tools can be made available as services to anyone, anywhere, and via any device. Large bio-datasets, highly complex algorithms, computing power demanding analysis methods, and the sudden need for hardware and computational resources provide an ideal environment for large-scale bio-data analysis for cloud computing. Cloud computing is already applied in the fields of biology and biochemistry, via numerous paradigms providing novel ideas stimulating future research. The concept of BioCloud has rapidly emerged with applications related to genomics, drug design, biology tools on the cloud, bio-databases, cloud bio-computing, and numerous applications related to biology and biochemistry. In this chapter, the authors present research results related to biology-related laboratories (BioLabs) as well as potential applications for the everyday clinical routine.

Download Full-text

Workflows for automated downstream data analysis and visualization in large‐scale computational mass spectrometry

PROTEOMICS ◽

10.1002/pmic.201400391 ◽

2015 ◽

Vol 15 (8) ◽

pp. 1443-1447 ◽

Cited By ~ 23

Author(s):

Stephan Aiche ◽

Timo Sachsenberg ◽

Erhan Kenar ◽

Mathias Walzer ◽

Bernd Wiswedel ◽

...

Keyword(s):

Mass Spectrometry ◽

Data Analysis ◽

Large Scale

Download Full-text

Use of directly coupled ion-exchange liquid chromatography–mass spectrometry and liquid chromatography–nuclear magnetic resonance spectroscopy as a strategy for polar metabolite identification

Journal of Chromatography B Biomedical Sciences and Applications ◽

10.1016/s0378-4347(00)00401-1 ◽

2000 ◽

Vol 748 (1) ◽

pp. 295-309 ◽

Cited By ~ 23

Author(s):

G.J Dear ◽

R.S Plumb ◽

B.C Sweatman ◽

P.S Parry ◽

A.D Roberts ◽

...

Keyword(s):

Mass Spectrometry ◽

Nuclear Magnetic Resonance ◽

Liquid Chromatography ◽

Magnetic Resonance ◽

Ion Exchange ◽

Magnetic Resonance Spectroscopy ◽

Metabolite Identification ◽

Resonance Spectroscopy ◽

Liquid Chromatography Mass Spectrometry ◽

Polar Metabolite

Download Full-text