scholarly journals Raw data queries during data-intensive parallel workflow execution

2017 ◽  
Vol 75 ◽  
pp. 402-422 ◽  
Author(s):  
Vítor Silva ◽  
José Leite ◽  
José J. Camata ◽  
Daniel de Oliveira ◽  
Alvaro L.G.A. Coutinho ◽  
...  
2016 ◽  
Vol 15 (1) ◽  
pp. 19-27
Author(s):  
Sucha SMANCHAT ◽  
Kanchana VIRIYAPANT

Scientific workflows have been employed to automate large scale scientific experiments by leveraging computational power provided on-demand by cloud computing platforms. Among these workflows, a parallel loop workflow is used for studying the effects of different input values of a scientific experiment. Because of its independent loop characteristic, a parallel loop workflow can be dynamically executed as parallel workflow instances to accelerate the execution. Such execution negates workflow traversal used in existing works to calculate execution time and cost during scheduling in order to maintain time and cost constraints. In this paper, we propose a novel scheduling technique that is able to handle dynamic parallel loop workflow execution through a new method for evaluating execution progress together with a workflow instance arrival control and a cloud resource adjustment mechanism. The proposed technique, which aims at maintaining a workflow deadline while reducing cost, is tested using 3 existing task scheduling heuristics as its task mapping strategies. The simulation results show that the proposed technique is practical and performs better when the time constraint is more relaxed. It also prefers task scheduling heuristics that allow for a more accurate progress evaluation.


2016 ◽  
Vol 3 (4) ◽  
pp. 163-175 ◽  
Author(s):  
Bakinam T. Essawy ◽  
Jonathan L. Goodall ◽  
Hao Xu ◽  
Arcot Rajasekar ◽  
James D. Myers ◽  
...  

2021 ◽  
Vol 7 ◽  
pp. e527
Author(s):  
Renan Souza ◽  
Vitor Silva ◽  
Alexandre A. B. Lima ◽  
Daniel de Oliveira ◽  
Patrick Valduriez ◽  
...  

Complex scientific experiments from various domains are typically modeled as workflows and executed on large-scale machines using a Parallel Workflow Management System (WMS). Since such executions usually last for hours or days, some WMSs provide user steering support, i.e., they allow users to run data analyses and, depending on the results, adapt the workflows at runtime. A challenge in the parallel execution control design is to manage workflow data for efficient executions while enabling user steering support. Data access for high scalability is typically transaction-oriented, while for data analysis, it is online analytical-oriented so that managing such hybrid workloads makes the challenge even harder. In this work, we present SchalaDB, an architecture with a set of design principles and techniques based on distributed in-memory data management for efficient workflow execution control and user steering. We propose a distributed data design for scalable workflow task scheduling and high availability driven by a parallel and distributed in-memory DBMS. To evaluate our proposal, we develop d-Chiron, a WMS designed according to SchalaDB’s principles. We carry out an extensive experimental evaluation on an HPC cluster with up to 960 computing cores. Among other analyses, we show that even when running data analyses for user steering, SchalaDB’s overhead is negligible for workloads composed of hundreds of concurrent tasks on shared data. Our results encourage workflow engine developers to follow a parallel and distributed data-oriented approach not only for scheduling and monitoring but also for user steering.


1962 ◽  
Vol 17 (9) ◽  
pp. 657-658 ◽  
Author(s):  
Leroy Wolins
Keyword(s):  

Sign in / Sign up

Export Citation Format

Share Document