Data Placement and Task Scheduling Optimization for Data Intensive Scientific Workflow in Multiple Data Centers Environment

Author(s):  
Mingjun Wang ◽  
Jinghui Zhang ◽  
Fang Dong ◽  
Junzhou Luo
2020 ◽  
Vol 2020 ◽  
pp. 1-13
Author(s):  
Zheyi Chen ◽  
Xu Zhao ◽  
Bing Lin

In hybrid cloud environments, reasonable data placement strategies are critical to the efficient execution of scientific workflows. Due to various loads, bandwidth fluctuations, and network congestions between different data centers as well as the dynamics of hybrid cloud environments, the data transmission time is uncertain. Thus, it poses huge challenges to the efficient data placement for scientific workflows. However, most of the traditional solutions for data placement focus on deterministic cloud environments, which lead to the excessive data transmission time of scientific workflows. To address this problem, we propose an adaptive discrete particle swarm optimization algorithm based on the fuzzy theory and genetic algorithm operators (DPSO-FGA) to minimize the fuzzy data transmission time of scientific workflows. The DPSO-FGA can rationally place the scientific workflow data while meeting the requirements of data privacy and the capacity limitations of data centers. Simulation results show that the DPSO-FGA can effectively reduce the fuzzy data transmission time of scientific workflows in hybrid cloud environments.


2018 ◽  
Vol 2 (3) ◽  
Author(s):  
Satwinder Kaur ◽  
Mehak Aggarwal

Cloud computing is an advance computing model using which several applications, data and countless IT services are provided over the Internet. Task scheduling plays a crucial role in cloud computing systems. The issue of task scheduling can be viewed as the finding or searching an optimal mapping/assignment of set of subtasks of different tasks over the available set of resources so that we can achieve the desired goals for tasks. With the enlargement of users of cloud the tasks need to be scheduled. Cloud’s performance depends on the task scheduling algorithms used. Numerous algorithms have been submitted in the past to solve the task scheduling problem for heterogeneous network of computers. The existing research work proposes different methods for data intensive applications which are energy and deadline aware task scheduling method. As scientific workflow is combination of fine grain and coarse grain task. Every task scheduled to VM has system overhead. If multiple fine grain task are executing in scientific workflow, it increase the scheduling overhead. To overcome the scheduling overhead, multiple small tasks has been combined to large task, which decrease the scheduling overhead and improve the execution time of the workflow. Horizontal clustering has been used to cluster the fine grained task further replication technique has been combined. The proposed scheduling algorithm improves the performance metrics such as execution time and cost. Further this research can be extended with improved clustering technique and replication methods.


Sign in / Sign up

Export Citation Format

Share Document