Container-based bioinformatics with Pachyderm

Mapping Intimacies ◽

10.1101/299032 ◽

2018 ◽

Cited By ~ 2

Author(s):

Jon Ander Novella ◽

Payam Emami Khoonsari ◽

Stephanie Herman ◽

Daniel Whitenack ◽

Marco Capuccini ◽

...

Keyword(s):

Data Management ◽

Data Science ◽

Management Framework ◽

Processing Pipeline ◽

Link Type ◽

Computational Environment ◽

Workflow System ◽

Block Storage ◽

Container Orchestration ◽

Multiple Scenarios

AbstractMotivation:Computational biologists face many challenges related to data size, and they need to manage complicated analyses often including multiple stages and multiple tools, all of which must be deployed to modern infrastructures. To address these challenges and maintain reproducibility of results, researchers need (i) a reliable way to run processing stages in any computational environment, (ii) a well-defined way to orchestrate those processing stages, and (iii) a data management layer that tracks data as it moves through the processing pipeline.Results:Pachyderm is an open-source workflow system and data management framework that fulfills these needs by creating a data pipelining and data versioning layer on top of projects from the container ecosystem, having Kubernetes as the backbone for container orchestration. We adapted Pachyderm and demonstrated its attractive properties in bioinformatics. A Helm Chart was created so that researchers can use Pachyderm in multiple scenarios. The Pachyderm File System was extended to support block storage. A wrapper for initiating Pachyderm on cloud-agnostic virtual infrastructures was created. The benefits of Pachyderm are illustrated via a large metabolomics workflow, demonstrating that Pachyderm enables efficient and sustainable data science workflows while maintaining reproducibility and scalability.Availability:Pachyderm is available from https://github.com/pachyderm/pachyderm. The Pachyderm Helm Chart is available from https://github.com/kubernetes/charts/tree/master/stable/pachyderm. Pachyderm is available out-of-the-box from the PhenoMeNal VRE (https://github.com/phnmnl/KubeNow-plugin) and general Kubernetes environments instantiated via KubeNow. The code of the workflow used for the analysis is available on GitHub (https://github.com/pharmbio/LC-MS-Pachyderm).Contact:[email protected]

Download Full-text

Data Management for an Integrated Computational Environment.

10.21236/ada329965 ◽

1997 ◽

Author(s):

Gerard T. Capraro

Keyword(s):

Data Management ◽

Computational Environment

Download Full-text

Service-Driven Data Management Framework for Electric Power Sensor Networks

2021 International Conference on Communications, Information System and Computer Engineering (CISCE) ◽

10.1109/cisce52179.2021.9445888 ◽

2021 ◽

Author(s):

Ziqi Wang ◽

Ying Liu ◽

Yudong Wang ◽

Kexuan Song

Keyword(s):

Sensor Networks ◽

Data Management ◽

Electric Power ◽

Management Framework ◽

Power Sensor

Download Full-text

An Integrated Data Management Framework for Drug Discovery – From Data Capturing to Decision Support

Current Topics in Medicinal Chemistry ◽

10.2174/156802612800672862 ◽

2012 ◽

Vol 12 (11) ◽

pp. 1237-1242 ◽

Cited By ~ 4

Author(s):

Walter Cedeno ◽

Simson Alex ◽

Edward P. Jaeger ◽

Dimitris K. Agrafiotis ◽

Victor S. Lobanov

Keyword(s):

Decision Support ◽

Drug Discovery ◽

Data Management ◽

Management Framework ◽

Data Capturing

Download Full-text

Bringing the World to the Classroom: Teaching Statistics and Programming in a Project-Based Setting

Political Science and Politics ◽

10.1017/s1049096521001104 ◽

2021 ◽

pp. 1-5

Author(s):

Cosima Meyer

Keyword(s):

Data Management ◽

Data Science ◽

Classroom Teaching ◽

Statistical Software ◽

Inverted Classroom ◽

Typical Data ◽

The World ◽

Virtual Classes ◽

Teaching Statistics ◽

Introductory Class

ABSTRACT This article introduces how to teach an interactive, one-semester-long statistics and programming class. The setting also can be applied to shorter and longer classes as well as introductory and advanced courses. I propose a project-based seminar that also encompasses elements of an inverted classroom. As a result of this combination, the seminar supports students’ learning progress and also creates engaging virtual classes. To demonstrate how to apply a project-based seminar setting to teaching statistics and programming classes, I use an introductory class to data wrangling and management with the statistical software program R. Students are guided through a typical data science workflow that requires data management and data wrangling and concludes with visualizing and presenting first research results during a simulated mini-conference.

Download Full-text

The Need for an Enterprise Risk Management Framework for Big Data Science Projects

Proceedings of the 9th International Conference on Data Science, Technology and Applications ◽

10.5220/0009874502680274 ◽

2020 ◽

Author(s):

Jeffrey Saltz ◽

Sucheta Lahiri

Keyword(s):

Risk Management ◽

Big Data ◽

Data Science ◽

Enterprise Risk Management ◽

Management Framework ◽

Risk Management Framework ◽

Science Projects ◽

Enterprise Risk

Download Full-text

DIRAC Data Management Framework

10.22323/1.270.0035 ◽

2017 ◽

Author(s):

Andrei Tsaregorodstsev ◽

Keyword(s):

Data Management ◽

Management Framework

Download Full-text

NFDI4Chem - Towards a National Research Data Infrastructure for Chemistry in Germany

Research Ideas and Outcomes ◽

10.3897/rio.6.e55852 ◽

2020 ◽

Vol 6 ◽

Cited By ~ 3

Author(s):

Christoph Steinbeck ◽

Oliver Koepler ◽

Felix Bach ◽

Sonja Herres-Pawlis ◽

Nicole Jung ◽

...

Keyword(s):

Data Management ◽

Data Science ◽

Open Data ◽

Research Data ◽

Data Standards ◽

Data Repositories ◽

Data Infrastructure ◽

Research Data Management ◽

Wide Range ◽

Chemistry Community

The vision of NFDI4Chem is the digitalisation of all key steps in chemical research to support scientists in their efforts to collect, store, process, analyse, disclose and re-use research data. Measures to promote Open Science and Research Data Management (RDM) in agreement with the FAIR data principles are fundamental aims of NFDI4Chem to serve the chemistry community with a holistic concept for access to research data. To this end, the overarching objective is the development and maintenance of a national research data infrastructure for the research domain of chemistry in Germany, and to enable innovative and easy to use services and novel scientific approaches based on re-use of research data. NFDI4Chem intends to represent all disciplines of chemistry in academia. We aim to collaborate closely with thematically related consortia. In the initial phase, NFDI4Chem focuses on data related to molecules and reactions including data for their experimental and theoretical characterisation. This overarching goal is achieved by working towards a number of key objectives: Key Objective 1: Establish a virtual environment of federated repositories for storing, disclosing, searching and re-using research data across distributed data sources. Connect existing data repositories and, based on a requirements analysis, establish domain-specific research data repositories for the national research community, and link them to international repositories. Key Objective 2: Initiate international community processes to establish minimum information (MI) standards for data and machine-readable metadata as well as open data standards in key areas of chemistry. Identify and recommend open data standards in key areas of chemistry, in order to support the FAIR principles for research data. Finally, develop standards, if there is a lack. Key Objective 3: Foster cultural and digital change towards Smart Laboratory Environments by promoting the use of digital tools in all stages of research and promote subsequent Research Data Management (RDM) at all levels of academia, beginning in undergraduate studies curricula. Key Objective 4: Engage with the chemistry community in Germany through a wide range of measures to create awareness for and foster the adoption of FAIR data management. Initiate processes to integrate RDM and data science into curricula. Offer a wide range of training opportunities for researchers. Key Objective 5: Explore synergies with other consortia and promote cross-cutting development within the NFDI. Key Objective 6: Provide a legally reliable framework of policies and guidelines for FAIR and open RDM.

Download Full-text