scholarly journals Container-based bioinformatics with Pachyderm

2018 ◽  
Author(s):  
Jon Ander Novella ◽  
Payam Emami Khoonsari ◽  
Stephanie Herman ◽  
Daniel Whitenack ◽  
Marco Capuccini ◽  
...  

AbstractMotivation:Computational biologists face many challenges related to data size, and they need to manage complicated analyses often including multiple stages and multiple tools, all of which must be deployed to modern infrastructures. To address these challenges and maintain reproducibility of results, researchers need (i) a reliable way to run processing stages in any computational environment, (ii) a well-defined way to orchestrate those processing stages, and (iii) a data management layer that tracks data as it moves through the processing pipeline.Results:Pachyderm is an open-source workflow system and data management framework that fulfills these needs by creating a data pipelining and data versioning layer on top of projects from the container ecosystem, having Kubernetes as the backbone for container orchestration. We adapted Pachyderm and demonstrated its attractive properties in bioinformatics. A Helm Chart was created so that researchers can use Pachyderm in multiple scenarios. The Pachyderm File System was extended to support block storage. A wrapper for initiating Pachyderm on cloud-agnostic virtual infrastructures was created. The benefits of Pachyderm are illustrated via a large metabolomics workflow, demonstrating that Pachyderm enables efficient and sustainable data science workflows while maintaining reproducibility and scalability.Availability:Pachyderm is available from https://github.com/pachyderm/pachyderm. The Pachyderm Helm Chart is available from https://github.com/kubernetes/charts/tree/master/stable/pachyderm. Pachyderm is available out-of-the-box from the PhenoMeNal VRE (https://github.com/phnmnl/KubeNow-plugin) and general Kubernetes environments instantiated via KubeNow. The code of the workflow used for the analysis is available on GitHub (https://github.com/pharmbio/LC-MS-Pachyderm).Contact:[email protected]

2012 ◽  
Vol 12 (11) ◽  
pp. 1237-1242 ◽  
Author(s):  
Walter Cedeno ◽  
Simson Alex ◽  
Edward P. Jaeger ◽  
Dimitris K. Agrafiotis ◽  
Victor S. Lobanov

2021 ◽  
pp. 1-5
Author(s):  
Cosima Meyer

ABSTRACT This article introduces how to teach an interactive, one-semester-long statistics and programming class. The setting also can be applied to shorter and longer classes as well as introductory and advanced courses. I propose a project-based seminar that also encompasses elements of an inverted classroom. As a result of this combination, the seminar supports students’ learning progress and also creates engaging virtual classes. To demonstrate how to apply a project-based seminar setting to teaching statistics and programming classes, I use an introductory class to data wrangling and management with the statistical software program R. Students are guided through a typical data science workflow that requires data management and data wrangling and concludes with visualizing and presenting first research results during a simulated mini-conference.


2017 ◽  
Author(s):  
Andrei Tsaregorodstsev ◽  

2020 ◽  
Vol 6 ◽  
Author(s):  
Christoph Steinbeck ◽  
Oliver Koepler ◽  
Felix Bach ◽  
Sonja Herres-Pawlis ◽  
Nicole Jung ◽  
...  

The vision of NFDI4Chem is the digitalisation of all key steps in chemical research to support scientists in their efforts to collect, store, process, analyse, disclose and re-use research data. Measures to promote Open Science and Research Data Management (RDM) in agreement with the FAIR data principles are fundamental aims of NFDI4Chem to serve the chemistry community with a holistic concept for access to research data. To this end, the overarching objective is the development and maintenance of a national research data infrastructure for the research domain of chemistry in Germany, and to enable innovative and easy to use services and novel scientific approaches based on re-use of research data. NFDI4Chem intends to represent all disciplines of chemistry in academia. We aim to collaborate closely with thematically related consortia. In the initial phase, NFDI4Chem focuses on data related to molecules and reactions including data for their experimental and theoretical characterisation. This overarching goal is achieved by working towards a number of key objectives: Key Objective 1: Establish a virtual environment of federated repositories for storing, disclosing, searching and re-using research data across distributed data sources. Connect existing data repositories and, based on a requirements analysis, establish domain-specific research data repositories for the national research community, and link them to international repositories. Key Objective 2: Initiate international community processes to establish minimum information (MI) standards for data and machine-readable metadata as well as open data standards in key areas of chemistry. Identify and recommend open data standards in key areas of chemistry, in order to support the FAIR principles for research data. Finally, develop standards, if there is a lack. Key Objective 3: Foster cultural and digital change towards Smart Laboratory Environments by promoting the use of digital tools in all stages of research and promote subsequent Research Data Management (RDM) at all levels of academia, beginning in undergraduate studies curricula. Key Objective 4: Engage with the chemistry community in Germany through a wide range of measures to create awareness for and foster the adoption of FAIR data management. Initiate processes to integrate RDM and data science into curricula. Offer a wide range of training opportunities for researchers. Key Objective 5: Explore synergies with other consortia and promote cross-cutting development within the NFDI. Key Objective 6: Provide a legally reliable framework of policies and guidelines for FAIR and open RDM.


Sign in / Sign up

Export Citation Format

Share Document