A Scientific Workflow Framework Integrated with Object Deputy Model for Data Provenance

Data Provenance in a Scientific Workflow Service Framework Integrated with Object Deputy Database

Chinese Journal of Computers ◽

10.3724/sp.j.1016.2008.00721 ◽

2009 ◽

Vol 31 (5) ◽

pp. 721-732 ◽

Cited By ~ 1

Author(s):

Li-Wei WANG ◽

Ze-Qian HUANG ◽

Min LUO ◽

Zhi-Yong PENG

Keyword(s):

Scientific Workflow ◽

Data Provenance ◽

Service Framework

Download Full-text

Data Provenance in Scientific Workflows

Handbook of Research on Computational Grid Technologies for Life Sciences, Biomedicine, and Healthcare ◽

10.4018/978-1-60566-374-6.ch003 ◽

2011 ◽

pp. 46-59

Author(s):

Khalid Belhajjame ◽

Paolo Missier ◽

Carole Goble

Keyword(s):

Real World ◽

Scientific Workflow ◽

Scientific Workflows ◽

Data Provenance ◽

Workflow Systems ◽

Scientific Experiments

Data provenance is key to understanding and interpreting the results of scientific experiments. This chapter introduces and characterises data provenance in scientific workflows using illustrative examples taken from real-world workflows. The characterisation takes the form of a taxonomy that is used for comparing and analysing provenance capabilities supplied by existing scientific workflow systems.

Download Full-text

Enriching Agronomic Experiments with Data Provenance

International Journal of Agricultural and Environmental Information Systems ◽

10.4018/ijaeis.2017070102 ◽

2017 ◽

Vol 8 (3) ◽

pp. 21-38

Author(s):

Sergio Manuel Serra da Cruz ◽

Jose Antonio Pires do Nascimento

Keyword(s):

Systematic Error ◽

Statistical Data ◽

Scientific Workflow ◽

Computational Experiments ◽

Data Provenance ◽

Computational Approaches ◽

Integration Platform ◽

Workflow System ◽

Scientific Experiments ◽

Different Types

Reproducibility is a major feature of Science. Even agronomic research of exemplary quality may have irreproducible empirical findings because of random or systematic error. The ability to reproduce agronomic experiments based on statistical data and legacy scripts are not easily achieved. We propose RFlow, a tool that aid researchers to manage, share, and enact the scientific experiments that encapsulate legacy R scripts. RFlow transparently captures provenance of scripts and endows experiments reproducibility. Unlike existing computational approaches, RFlow is non-intrusive, does not require users to change their working way, it wraps agronomic experiments in a scientific workflow system. Our computational experiments show that the tool can collect different types of provenance metadata of real experiments and enrich agronomic data with provenance metadata. This study shows the potential of RFlow to serve as the primary integration platform for legacy R scripts, with implications for other data- and compute-intensive agronomic projects.

Download Full-text

Project Histories: Managing Data Provenance Across Collection-Oriented Scientific Workflow Runs

Lecture Notes in Computer Science - Data Integration in the Life Sciences ◽

10.1007/978-3-540-73255-6_12 ◽

2007 ◽

pp. 122-138 ◽

Cited By ~ 10

Author(s):

Shawn Bowers ◽

Timothy McPhillips ◽

Martin Wu ◽

Bertram Ludäscher

Keyword(s):

Scientific Workflow ◽

Data Provenance

Download Full-text

Reconstructing Unsound Data Provenance View in Scientific Workflow

Web Technologies and Applications - Lecture Notes in Computer Science ◽

10.1007/978-3-642-29426-6_25 ◽

2012 ◽

pp. 212-220 ◽

Cited By ~ 1

Author(s):

Hua Hu ◽

Zhanchen Liu ◽

Haiyang Hu

Keyword(s):

Scientific Workflow ◽

Data Provenance

Download Full-text

Opening Up Climate Research: A Linked Data Approach to Publishing Data Provenance

International Journal of Digital Curation ◽

10.2218/ijdc.v7i1.223 ◽

2012 ◽

Vol 7 (1) ◽

pp. 163-173 ◽

Cited By ~ 7

Author(s):

Arif Shaon ◽

Sarah Callaghan ◽

Bryan Lawrence ◽

Brian Matthews ◽

Timothy Osborn ◽

...

Keyword(s):

Linked Data ◽

Climate Science ◽

Scientific Output ◽

Scientific Workflow ◽

Digital Object Identifier ◽

Data Provenance ◽

Detailed Knowledge ◽

Academic Journal ◽

Science Data ◽

Climate Research

Traditionally, the formal scientific output in most fields of natural science has been limited to peer-reviewed academic journal publications, with less attention paid to the chain of intermediate data results and their associated metadata, including provenance. In effect, this has constrained the representation and verification of the data provenance to the confines of the related publications. Detailed knowledge of a dataset’s provenance is essential to establish the pedigree of the data for its effective re-use, and to avoid redundant re-enactment of the experiment or computation involved. It is increasingly important for open-access data to determine their authenticity and quality, especially considering the growing volumes of datasets appearing in the public domain. To address these issues, we present an approach that combines the Digital Object Identifier (DOI) – a widely adopted citation technique – with existing, widely adopted climate science data standards to formally publish detailed provenance of a climate research dataset as an associated scientific workflow. This is integrated with linked-data compliant data re-use standards (e.g. OAI-ORE) to enable a seamless link between a publication and the complete trail of lineage of the corresponding dataset, including the dataset itself.

Download Full-text