scholarly journals Interoperable and scalable data analysis with microservices: Applications in Metabolomics

2017 ◽  
Author(s):  
Payam Emami Khoonsari ◽  
Pablo Moreno ◽  
Sven Bergmann ◽  
Joachim Burman ◽  
Marco Capuccini ◽  
...  

Developing a robust and performant data analysis workflow that integrates all necessary components whilst still being able to scale over multiple compute nodes is a challenging task. We introduce a generic method based on the microservice architecture, where software tools are encapsulated as Docker containers that can be connected into scientific workflows and executed in parallel using the Kubernetes container orchestrator. The access point is a virtual research environment which can be launched on-demand on cloud resources and desktop computers. IT-expertise requirements on the user side are kept to a minimum, and established workflows can be re-used effortlessly by any novice user. We validate our method in the field of metabolomics on two mass spectrometry studies, one nuclear magnetic resonance spectroscopy study and one fluxomics study, showing that the method scales dynamically with increasing availability of computational resources. We achieved a complete integration of the major software suites resulting in the first turn-key workflow encompassing all steps for mass-spectrometry-based metabolomics including preprocessing, multivariate statistics, and metabolite identification. Microservices is a generic methodology that can serve any scientific discipline and opens up for new types of large-scale integrative science.

2019 ◽  
Vol 35 (19) ◽  
pp. 3752-3760 ◽  
Author(s):  
Payam Emami Khoonsari ◽  
Pablo Moreno ◽  
Sven Bergmann ◽  
Joachim Burman ◽  
Marco Capuccini ◽  
...  

Abstract Motivation Developing a robust and performant data analysis workflow that integrates all necessary components whilst still being able to scale over multiple compute nodes is a challenging task. We introduce a generic method based on the microservice architecture, where software tools are encapsulated as Docker containers that can be connected into scientific workflows and executed using the Kubernetes container orchestrator. Results We developed a Virtual Research Environment (VRE) which facilitates rapid integration of new tools and developing scalable and interoperable workflows for performing metabolomics data analysis. The environment can be launched on-demand on cloud resources and desktop computers. IT-expertise requirements on the user side are kept to a minimum, and workflows can be re-used effortlessly by any novice user. We validate our method in the field of metabolomics on two mass spectrometry, one nuclear magnetic resonance spectroscopy and one fluxomics study. We showed that the method scales dynamically with increasing availability of computational resources. We demonstrated that the method facilitates interoperability using integration of the major software suites resulting in a turn-key workflow encompassing all steps for mass-spectrometry-based metabolomics including preprocessing, statistics and identification. Microservices is a generic methodology that can serve any scientific discipline and opens up for new types of large-scale integrative science. Availability and implementation The PhenoMeNal consortium maintains a web portal (https://portal.phenomenal-h2020.eu) providing a GUI for launching the Virtual Research Environment. The GitHub repository https://github.com/phnmnl/ hosts the source code of all projects. Supplementary information Supplementary data are available at Bioinformatics online.


2021 ◽  
Author(s):  
Karen E. Christianson ◽  
Jacob. D. Jaffe ◽  
Steven A. Carr ◽  
Alvaro Sebastian Vaca Jacome

AbstractData-independent acquisition (DIA) is a powerful mass spectrometry method that promises higher coverage, reproducibility, and throughput than traditional quantitative proteomics approaches. However, the complexity of DIA data caused by fragmentation of co-isolating peptides presents significant challenges for confident assignment of identity and quantity, information that is essential for deriving meaningful biological insight from the data. To overcome this problem, we previously developed Avant-garde, a tool for automated signal refinement of DIA and other targeted mass spectrometry data. AvG is designed to work alongside existing tools for peptide detection to address the reliability and quantitative suitability of signals extracted for the identified peptides. While its use is straightforward and offers efficient refinement for small datasets, the execution of AvG for large DIA datasets is time-consuming, especially if run with limited computational resources. To overcome these limitations, we present here an improved, cloud-based implementation of the AvG algorithm deployed on Terra, a user-friendly cloud-based platform for large-scale data analysis and sharing, as an accessible and standardized resource to the wider community.


2009 ◽  
Vol 10 (Suppl 11) ◽  
pp. S17 ◽  
Author(s):  
Ken Pendarvis ◽  
Ranjit Kumar ◽  
Shane C Burgess ◽  
Bindu Nanduri

Author(s):  
Taiga Abe ◽  
Ian Kinsella ◽  
Shreya Saxena ◽  
Liam Paninski ◽  
John P. Cunningham

AbstractA major goal of computational neuroscience is to develop powerful analysis tools that operate on large datasets. These methods provide an essential toolset to unlock scientific insights from new experiments. Unfortunately, a major obstacle currently impedes progress: while existing analysis methods are frequently shared as open source software, the infrastructure needed to deploy these methods – at scale, reproducibly, cheaply, and quickly – remains totally inaccessible to all but a minority of expert users. As a result, many users can not fully exploit these tools, due to constrained computational resources (limited or costly compute hardware) and/or mismatches in expertise (experimentalists vs. large-scale computing experts). In this work we develop Neuroscience Cloud Analysis As a Service (NeuroCAAS): a fully-managed infrastructure platform, based on modern large-scale computing advances, that makes state-of-the-art data analysis tools accessible to the neuroscience community. We offer NeuroCAAS as an open source service with a drag-and-drop interface, entirely removing the burden of infrastructure expertise, purchasing, maintenance, and deployment. NeuroCAAS is enabled by three key contributions. First, NeuroCAAS cleanly separates tool implementation from usage, allowing cutting-edge methods to be served directly to the end user with no need to read or install any analysis software. Second, NeuroCAAS automatically scales as needed, providing reliable, highly elastic computational resources that are more efficient than personal or lab-supported hardware, without management overhead. Finally, we show that many popular data analysis tools offered through NeuroCAAS outperform typical analysis solutions (in terms of speed and cost) while improving ease of use and maintenance, dispelling the myth that cloud compute is prohibitively expensive and technically inaccessible. By removing barriers to fast, efficient cloud computation, NeuroCAAS can dramatically accelerate both the dissemination and the effective use of cutting-edge analysis tools for neuroscientific discovery.


2011 ◽  
Vol 8 (2) ◽  
pp. 23-38 ◽  
Author(s):  
Christian Hildebrandt ◽  
Sebastian Wolf ◽  
Steffen Neumann

Summary Mass spectrometry is an important analytical technology for the identification of metabolites and small compounds by their exact mass. But dozens or hundreds of different compounds may have a similar mass or even the same molecule formula. Further elucidation requires tandem mass spectrometry, which provides the masses of compound fragments, but in silico fragmentation programs require substantial computational resources if applied to large numbers of candidate structures.We present and evaluate an approach to obtain candidates from a relational database which contains 28 million compounds from PubChem.A training phase associates tandem-MS peaks with corresponding fragment structures. For the candidate search, the peaks in a query spectrum are translated to fragment structures, and the candidates are retrieved and sorted by the number of matching fragment structures. In the cross validation the evaluation of the relative ranking positions (RRP) using different sizes of training sets confirms that a larger coverage of training data improves the average RRP from 0.65 to 0.72. Our approach allows downstream algorithms to process candidates in order of importance.


2015 ◽  
pp. 1272-1293
Author(s):  
Abraham Pouliakis ◽  
Aris Spathis ◽  
Christine Kottaridi ◽  
Antonia Mourtzikou ◽  
Marilena Stamouli ◽  
...  

Cloud computing has quickly emerged as an exciting new paradigm providing models of computing and services. Via cloud computing technology, bioinformatics tools can be made available as services to anyone, anywhere, and via any device. Large bio-datasets, highly complex algorithms, computing power demanding analysis methods, and the sudden need for hardware and computational resources provide an ideal environment for large-scale bio-data analysis for cloud computing. Cloud computing is already applied in the fields of biology and biochemistry, via numerous paradigms providing novel ideas stimulating future research. The concept of BioCloud has rapidly emerged with applications related to genomics, drug design, biology tools on the cloud, bio-databases, cloud bio-computing, and numerous applications related to biology and biochemistry. In this chapter, the authors present research results related to biology-related laboratories (BioLabs) as well as potential applications for the everyday clinical routine.


PROTEOMICS ◽  
2015 ◽  
Vol 15 (8) ◽  
pp. 1443-1447 ◽  
Author(s):  
Stephan Aiche ◽  
Timo Sachsenberg ◽  
Erhan Kenar ◽  
Mathias Walzer ◽  
Bernd Wiswedel ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document