Workflows for automated downstream data analysis and visualization in large‐scale computational mass spectrometry

Stephan Aiche; Timo Sachsenberg; Erhan Kenar; Mathias Walzer; Bernd Wiswedel; Theresa Kristl; Matthew Boyles; Albert Duschl; Christian G. Huber; Michael R. Berthold; Knut Reinert; Oliver Kohlbacher

doi:10.1002/pmic.201400391

Interoperable and scalable data analysis with microservices: Applications in Metabolomics

10.1101/213603 ◽

2017 ◽

Cited By ~ 2

Author(s):

Payam Emami Khoonsari ◽

Pablo Moreno ◽

Sven Bergmann ◽

Joachim Burman ◽

Marco Capuccini ◽

...

Keyword(s):

Mass Spectrometry ◽

Data Analysis ◽

Large Scale ◽

Metabolite Identification ◽

Access Point ◽

Scientific Discipline ◽

Resonance Spectroscopy ◽

Magnetic Resonance Spectroscopy Study ◽

Analysis Workflow ◽

Computational Resources

Developing a robust and performant data analysis workflow that integrates all necessary components whilst still being able to scale over multiple compute nodes is a challenging task. We introduce a generic method based on the microservice architecture, where software tools are encapsulated as Docker containers that can be connected into scientific workflows and executed in parallel using the Kubernetes container orchestrator. The access point is a virtual research environment which can be launched on-demand on cloud resources and desktop computers. IT-expertise requirements on the user side are kept to a minimum, and established workflows can be re-used effortlessly by any novice user. We validate our method in the field of metabolomics on two mass spectrometry studies, one nuclear magnetic resonance spectroscopy study and one fluxomics study, showing that the method scales dynamically with increasing availability of computational resources. We achieved a complete integration of the major software suites resulting in the first turn-key workflow encompassing all steps for mass-spectrometry-based metabolomics including preprocessing, multivariate statistics, and metabolite identification. Microservices is a generic methodology that can serve any scientific discipline and opens up for new types of large-scale integrative science.

Download Full-text

Cloud-based DIA data analysis module for signal refinement improves accuracy and throughput of large datasets

10.1101/2021.07.14.452243 ◽

2021 ◽

Author(s):

Karen E. Christianson ◽

Jacob. D. Jaffe ◽

Steven A. Carr ◽

Alvaro Sebastian Vaca Jacome

Keyword(s):

Mass Spectrometry ◽

Data Analysis ◽

Large Scale ◽

Mass Spectrometry Data ◽

Avant Garde ◽

Data Independent Acquisition ◽

Large Scale Data ◽

Biological Insight ◽

Computational Resources ◽

User Friendly

AbstractData-independent acquisition (DIA) is a powerful mass spectrometry method that promises higher coverage, reproducibility, and throughput than traditional quantitative proteomics approaches. However, the complexity of DIA data caused by fragmentation of co-isolating peptides presents significant challenges for confident assignment of identity and quantity, information that is essential for deriving meaningful biological insight from the data. To overcome this problem, we previously developed Avant-garde, a tool for automated signal refinement of DIA and other targeted mass spectrometry data. AvG is designed to work alongside existing tools for peptide detection to address the reliability and quantitative suitability of signals extracted for the identified peptides. While its use is straightforward and offers efficient refinement for small datasets, the execution of AvG for large DIA datasets is time-consuming, especially if run with limited computational resources. To overcome these limitations, we present here an improved, cloud-based implementation of the AvG algorithm deployed on Terra, a user-friendly cloud-based platform for large-scale data analysis and sharing, as an accessible and standardized resource to the wider community.

Download Full-text

Interoperable and scalable data analysis with microservices: applications in metabolomics

Bioinformatics ◽

10.1093/bioinformatics/btz160 ◽

2019 ◽

Vol 35 (19) ◽

pp. 3752-3760 ◽

Cited By ~ 10

Author(s):

Payam Emami Khoonsari ◽

Pablo Moreno ◽

Sven Bergmann ◽

Joachim Burman ◽

Marco Capuccini ◽

...

Keyword(s):

Mass Spectrometry ◽

Data Analysis ◽

Large Scale ◽

Scientific Discipline ◽

Supplementary Information ◽

Resonance Spectroscopy ◽

Research Environment ◽

Metabolomics Data ◽

Analysis Workflow ◽

Virtual Research Environment

Abstract Motivation Developing a robust and performant data analysis workflow that integrates all necessary components whilst still being able to scale over multiple compute nodes is a challenging task. We introduce a generic method based on the microservice architecture, where software tools are encapsulated as Docker containers that can be connected into scientific workflows and executed using the Kubernetes container orchestrator. Results We developed a Virtual Research Environment (VRE) which facilitates rapid integration of new tools and developing scalable and interoperable workflows for performing metabolomics data analysis. The environment can be launched on-demand on cloud resources and desktop computers. IT-expertise requirements on the user side are kept to a minimum, and workflows can be re-used effortlessly by any novice user. We validate our method in the field of metabolomics on two mass spectrometry, one nuclear magnetic resonance spectroscopy and one fluxomics study. We showed that the method scales dynamically with increasing availability of computational resources. We demonstrated that the method facilitates interoperability using integration of the major software suites resulting in a turn-key workflow encompassing all steps for mass-spectrometry-based metabolomics including preprocessing, statistics and identification. Microservices is a generic methodology that can serve any scientific discipline and opens up for new types of large-scale integrative science. Availability and implementation The PhenoMeNal consortium maintains a web portal (https://portal.phenomenal-h2020.eu) providing a GUI for launching the Virtual Research Environment. The GitHub repository https://github.com/phnmnl/ hosts the source code of all projects. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Algorithm of combining chromatography mass spectrometry-untargeted profiling and multivariate analysis for identification of marker-substances in samples of complex composition

Industrial laboratory Diagnostics of materials ◽

10.26896/1028-6861-2020-86-7-12-19 ◽

2020 ◽

Vol 86 (7) ◽

pp. 12-19

Author(s):

I. V. Plyushchenko ◽

D. G. Shakhmatov ◽

I. A. Rodin

Keyword(s):

Mass Spectrometry ◽

Multivariate Analysis ◽

Large Scale ◽

Complex Composition ◽

Unified Protocol ◽

Chromatography Mass Spectrometry ◽

Marker Substances ◽

Selection Testing ◽

Untargeted Profiling

A viral development of statistical data processing, computing capabilities, chromatography-mass spectrometry, and omics technologies (technologies based on the achievements of genomics, transcriptomics, proteomics, metabolomics) in recent decades has not led to formation of a unified protocol for untargeted profiling. Systematic errors reduce the reproducibility and reliability of the obtained results, and at the same time hinder consolidation and analysis of data gained in large-scale multi-day experiments. We propose an algorithm for conducting omics profiling to identify potential markers in the samples of complex composition and present the case study of urine samples obtained from different clinical groups of patients. Profiling was carried out by the method of liquid chromatography mass spectrometry. The markers were selected using methods of multivariate analysis including machine learning and feature selection. Testing of the approach was performed using an independent dataset by clustering and projection on principal components.

Download Full-text

Integrative Data Analysis from a Unifying Research Synthesis Perspective

10.1093/oso/9780190676001.003.0020 ◽

2018 ◽

Author(s):

Eun-Young Mun ◽

Anne E. Ray

Keyword(s):

Data Analysis ◽

Large Scale ◽

Research Synthesis ◽

Alcohol Intervention ◽

Data Set ◽

Integrative Data Analysis ◽

Level Data ◽

Model Complex ◽

Wide Range ◽

Individual Participant

Integrative data analysis (IDA) is a promising new approach in psychological research and has been well received in the field of alcohol research. This chapter provides a larger unifying research synthesis framework for IDA. Major advantages of IDA of individual participant-level data include better and more flexible ways to examine subgroups, model complex relationships, deal with methodological and clinical heterogeneity, and examine infrequently occurring behaviors. However, between-study heterogeneity in measures, designs, and samples and systematic study-level missing data are significant barriers to IDA and, more broadly, to large-scale research synthesis. Based on the authors’ experience working on the Project INTEGRATE data set, which combined individual participant-level data from 24 independent college brief alcohol intervention studies, it is also recognized that IDA investigations require a wide range of expertise and considerable resources and that some minimum standards for reporting IDA studies may be needed to improve transparency and quality of evidence.

Download Full-text

Cyberstalking Victimization Model Using Criminological Theory: A Systematic Literature Review, Taxonomies, Applications, Tools, and Validations

Electronics ◽

10.3390/electronics10141670 ◽

2021 ◽

Vol 10 (14) ◽

pp. 1670

Author(s):

Waheeb Abu-Ulbeh ◽

Maryam Altalhi ◽

Laith Abualigah ◽

Abdulwahab Ali Almazroi ◽

Putra Sumari ◽

...

Keyword(s):

Data Analysis ◽

Structural Equation ◽

Large Scale ◽

Review Paper ◽

Essential Element ◽

Routine Activities ◽

Criminological Theory ◽

Equation Modeling ◽

Future Research ◽

Proposed Model

Cyberstalking is a growing anti-social problem being transformed on a large scale and in various forms. Cyberstalking detection has become increasingly popular in recent years and has technically been investigated by many researchers. However, cyberstalking victimization, an essential part of cyberstalking, has empirically received less attention from the paper community. This paper attempts to address this gap and develop a model to understand and estimate the prevalence of cyberstalking victimization. The model of this paper is produced using routine activities and lifestyle exposure theories and includes eight hypotheses. The data of this paper is collected from the 757 respondents in Jordanian universities. This review paper utilizes a quantitative approach and uses structural equation modeling for data analysis. The results revealed a modest prevalence range is more dependent on the cyberstalking type. The results also indicated that proximity to motivated offenders, suitable targets, and digital guardians significantly influences cyberstalking victimization. The outcome from moderation hypothesis testing demonstrated that age and residence have a significant effect on cyberstalking victimization. The proposed model is an essential element for assessing cyberstalking victimization among societies, which provides a valuable understanding of the prevalence of cyberstalking victimization. This can assist the researchers and practitioners for future research in the context of cyberstalking victimization.

Download Full-text

Microcomputers in Political Science

News for Teachers of Political Science ◽

10.1017/s0197901900005079 ◽

1983 ◽

Vol 38 ◽

pp. 1-9

Author(s):

Herbert F. Weisberg

Keyword(s):

Data Analysis ◽

Political Science ◽

Large Scale ◽

Turnaround Time ◽

General Purpose ◽

Batch Mode ◽

New Era ◽

Large Scale Data ◽

The Social ◽

Frequency Counts

We are now entering a new era of computing in political science. The first era was marked by punched-card technology. Initially, the most sophisticated analyses possible were frequency counts and tables produced on a counter-sorter, a machine that specialized in chewing up data cards. By the early 1960s, batch processing on large mainframe computers became the predominant mode of data analysis, with turnaround time of up to a week. By the late 1960s, turnaround time was cut down to a matter of a few minutes and OSIRIS and then SPSS (and more recently SAS) were developed as general-purpose data analysis packages for the social sciences. Even today, use of these packages in batch mode remains one of the most efficient means of processing large-scale data analysis.

Download Full-text

When didactics meet data science: process data analysis in large-scale mathematics assessment in France

Large-scale Assessments in Education ◽

10.1186/s40536-020-00085-y ◽

2020 ◽

Vol 8 (1) ◽

Author(s):

Franck Salles ◽

Reinaldo Dos Santos ◽

Saskia Keskpaik

Keyword(s):

Data Analysis ◽

Large Scale ◽

Data Science ◽

Mathematics Assessment ◽

Process Data ◽

Meet Data

Download Full-text

Automated 16-Plex Plasma Proteomics with Real-Time Search and Ion Mobility Mass Spectrometry Enables Large-Scale Profiling in Naked Mole-Rats and Mice

Journal of Proteome Research ◽

10.1021/acs.jproteome.0c00681 ◽

2021 ◽

Vol 20 (2) ◽

pp. 1280-1295

Author(s):

Aleksandr Gaun ◽

Kaitlyn N. Lewis Hardell ◽

Niclas Olsson ◽

Jonathon J. O’Brien ◽

Sudha Gollapudi ◽

...

Keyword(s):

Mass Spectrometry ◽

Real Time ◽

Ion Mobility ◽

Large Scale ◽

Ion Mobility Mass Spectrometry ◽

Plasma Proteomics ◽

Rats And Mice

Download Full-text

The Gut Microbiota of Healthy Aged Chinese Is Similar to That of the Healthy Young

mSphere ◽

10.1128/msphere.00327-17 ◽

2017 ◽

Vol 2 (5) ◽

Cited By ~ 65

Author(s):

Gaorui Bian ◽

Gregory B. Gloor ◽

Aihua Gong ◽

Changsheng Jia ◽

Wei Zhang ◽

...

Keyword(s):

Data Analysis ◽

Gut Microbiota ◽

Large Scale ◽

Compositional Data ◽

Healthy Lifestyle ◽

Compositional Data Analysis ◽

Surprising Result ◽

Microbiota Composition ◽

Cross Sectional ◽

Age Cohorts

ABSTRACT We report the large-scale use of compositional data analysis to establish a baseline microbiota composition in an extremely healthy cohort of the Chinese population. This baseline will serve for comparison for future cohorts with chronic or acute disease. In addition to the expected difference in the microbiota of children and adults, we found that the microbiota of the elderly in this population was similar in almost all respects to that of healthy people in the same population who are scores of years younger. We speculate that this similarity is a consequence of an active healthy lifestyle and diet, although cause and effect cannot be ascribed in this (or any other) cross-sectional design. One surprising result was that the gut microbiota of persons in their 20s was distinct from those of other age cohorts, and this result was replicated, suggesting that it is a reproducible finding and distinct from those of other populations. The microbiota of the aged is variously described as being more or less diverse than that of younger cohorts, but the comparison groups used and the definitions of the aged population differ between experiments. The differences are often described by null hypothesis statistical tests, which are notoriously irreproducible when dealing with large multivariate samples. We collected and examined the gut microbiota of a cross-sectional cohort of more than 1,000 very healthy Chinese individuals who spanned ages from 3 to over 100 years. The analysis of 16S rRNA gene sequencing results used a compositional data analysis paradigm coupled with measures of effect size, where ordination, differential abundance, and correlation can be explored and analyzed in a unified and reproducible framework. Our analysis showed several surprising results compared to other cohorts. First, the overall microbiota composition of the healthy aged group was similar to that of people decades younger. Second, the major differences between groups in the gut microbiota profiles were found before age 20. Third, the gut microbiota differed little between individuals from the ages of 30 to >100. Fourth, the gut microbiota of males appeared to be more variable than that of females. Taken together, the present findings suggest that the microbiota of the healthy aged in this cross-sectional study differ little from that of the healthy young in the same population, although the minor variations that do exist depend upon the comparison cohort. IMPORTANCE We report the large-scale use of compositional data analysis to establish a baseline microbiota composition in an extremely healthy cohort of the Chinese population. This baseline will serve for comparison for future cohorts with chronic or acute disease. In addition to the expected difference in the microbiota of children and adults, we found that the microbiota of the elderly in this population was similar in almost all respects to that of healthy people in the same population who are scores of years younger. We speculate that this similarity is a consequence of an active healthy lifestyle and diet, although cause and effect cannot be ascribed in this (or any other) cross-sectional design. One surprising result was that the gut microbiota of persons in their 20s was distinct from those of other age cohorts, and this result was replicated, suggesting that it is a reproducible finding and distinct from those of other populations.

Download Full-text