Raw data queries during data-intensive parallel workflow execution

Scientific workflows have been employed to automate large scale scientific experiments by leveraging computational power provided on-demand by cloud computing platforms. Among these workflows, a parallel loop workflow is used for studying the effects of different input values of a scientific experiment. Because of its independent loop characteristic, a parallel loop workflow can be dynamically executed as parallel workflow instances to accelerate the execution. Such execution negates workflow traversal used in existing works to calculate execution time and cost during scheduling in order to maintain time and cost constraints. In this paper, we propose a novel scheduling technique that is able to handle dynamic parallel loop workflow execution through a new method for evaluating execution progress together with a workflow instance arrival control and a cloud resource adjustment mechanism. The proposed technique, which aims at maintaining a workflow deadline while reducing cost, is tested using 3 existing task scheduling heuristics as its task mapping strategies. The simulation results show that the proposed technique is practical and performs better when the time constraint is more relaxed. It also prefers task scheduling heuristics that allow for a more accurate progress evaluation.

Download Full-text

Server‐side workflow execution using data grid technology for reproducible analyses of data‐intensive hydrologic systems

Earth and Space Science ◽

10.1002/2015ea000139 ◽

2016 ◽

Vol 3 (4) ◽

pp. 163-175 ◽

Cited By ~ 8

Author(s):

Bakinam T. Essawy ◽

Jonathan L. Goodall ◽

Hao Xu ◽

Arcot Rajasekar ◽

James D. Myers ◽

...

Keyword(s):

Data Grid ◽

Grid Technology ◽

Workflow Execution ◽

Data Intensive ◽

Server Side ◽

Hydrologic Systems ◽

Using Data

Download Full-text

A comparative study of in-sensor processing vs. raw data transmission using ZigBee, BLE and Wi-Fi for data intensive monitoring applications

2014 11th International Symposium on Wireless Communications Systems (ISWCS) ◽

10.1109/iswcs.2014.6933409 ◽

2014 ◽

Cited By ~ 13

Author(s):

Khurram Shahzad ◽

Bengt Oelmann

Keyword(s):

Comparative Study ◽

Data Transmission ◽

Raw Data ◽

Intensive Monitoring ◽

Data Intensive ◽

Monitoring Applications ◽

Sensor Processing

Download Full-text

Distributed in-memory data management for workflow executions

PeerJ Computer Science ◽

10.7717/peerj-cs.527 ◽

2021 ◽

Vol 7 ◽

pp. e527

Author(s):

Renan Souza ◽

Vitor Silva ◽

Alexandre A. B. Lima ◽

Daniel de Oliveira ◽

Patrick Valduriez ◽

...

Keyword(s):

Data Management ◽

Large Scale ◽

Workflow Management ◽

Data Access ◽

Parallel Execution ◽

Distributed Data ◽

Workflow Execution ◽

Data Analyses ◽

Concurrent Tasks ◽

Parallel Workflow

Complex scientific experiments from various domains are typically modeled as workflows and executed on large-scale machines using a Parallel Workflow Management System (WMS). Since such executions usually last for hours or days, some WMSs provide user steering support, i.e., they allow users to run data analyses and, depending on the results, adapt the workflows at runtime. A challenge in the parallel execution control design is to manage workflow data for efficient executions while enabling user steering support. Data access for high scalability is typically transaction-oriented, while for data analysis, it is online analytical-oriented so that managing such hybrid workloads makes the challenge even harder. In this work, we present SchalaDB, an architecture with a set of design principles and techniques based on distributed in-memory data management for efficient workflow execution control and user steering. We propose a distributed data design for scalable workflow task scheduling and high availability driven by a parallel and distributed in-memory DBMS. To evaluate our proposal, we develop d-Chiron, a WMS designed according to SchalaDB’s principles. We carry out an extensive experimental evaluation on an HPC cluster with up to 960 computing cores. Among other analyses, we show that even when running data analyses for user steering, SchalaDB’s overhead is negligible for workloads composed of hundreds of concurrent tasks on shared data. Our results encourage workflow engine developers to follow a parallel and distributed data-oriented approach not only for scheduling and monitoring but also for user steering.

Download Full-text