An incremental reinforcement learning scheduling strategy for data‐intensive scientific workflows in the cloud

Ensemble Learning of Run-Time Prediction Models for Data-Intensive Scientific Workflows

Communications in Computer and Information Science - High Performance Computing ◽

10.1007/978-3-662-45483-1_7 ◽

2014 ◽

pp. 83-97 ◽

Cited By ~ 2

Author(s):

David A. Monge ◽

Matĕj Holec ◽

Filip Z̆elezný ◽

Carlos García Garino

Keyword(s):

Ensemble Learning ◽

Prediction Models ◽

Scientific Workflows ◽

Data Intensive ◽

Time Prediction ◽

Run Time

Download Full-text

Wind and Storage Cooperative Scheduling Strategy Based on Deep Reinforcement Learning Algorithm

Journal of Physics Conference Series ◽

10.1088/1742-6596/1213/3/032002 ◽

2019 ◽

Vol 1213 ◽

pp. 032002

Author(s):

Jingtao Qin ◽

Xueshan Han ◽

Guojing Liu ◽

Shang Wang ◽

Wenbo Li ◽

...

Keyword(s):

Reinforcement Learning ◽

Learning Algorithm ◽

Scheduling Strategy ◽

And Storage ◽

Cooperative Scheduling ◽

Reinforcement Learning Algorithm

Download Full-text

Multi-objective scheduling strategy for scientific workflows in cloud environment: A Firefly-based approach

Applied Soft Computing ◽

10.1016/j.asoc.2020.106411 ◽

2020 ◽

Vol 93 ◽

pp. 106411

Author(s):

Mainak Adhikari ◽

Tarachand Amgoth ◽

Satish Narayana Srirama

Keyword(s):

Scientific Workflows ◽

Cloud Environment ◽

Scheduling Strategy ◽

Multi Objective

Download Full-text

A hybrid evolutionary algorithm for task scheduling and data assignment of data-intensive scientific workflows on clouds

Future Generation Computer Systems ◽

10.1016/j.future.2017.05.017 ◽

2017 ◽

Vol 76 ◽

pp. 1-17 ◽

Cited By ~ 18

Author(s):

Luan Teylo ◽

Ubiratam de Paula ◽

Yuri Frota ◽

Daniel de Oliveira ◽

Lúcia M.A. Drummond

Keyword(s):

Evolutionary Algorithm ◽

Task Scheduling ◽

Scientific Workflows ◽

Hybrid Evolutionary Algorithm ◽

Data Intensive

Download Full-text

Science in the Cloud: Allocation and Execution of Data-Intensive Scientific Workflows

Journal of Grid Computing ◽

10.1007/s10723-013-9282-3 ◽

2013 ◽

Vol 12 (2) ◽

pp. 245-264 ◽

Cited By ~ 41

Author(s):

Claudia Szabo ◽

Quan Z. Sheng ◽

Trent Kroeger ◽

Yihong Zhang ◽

Jian Yu

Keyword(s):

Scientific Workflows ◽

Data Intensive

Download Full-text

Combining Static and Dynamic Storage Management for Data Intensive Scientific Workflows

IEEE Transactions on Parallel and Distributed Systems ◽

10.1109/tpds.2017.2764897 ◽

2018 ◽

Vol 29 (2) ◽

pp. 338-350 ◽

Cited By ~ 7

Author(s):

Nicholas Hazekamp ◽

Nathaniel Kremer-Herman ◽

Benjamin Tovar ◽

Haiyan Meng ◽

Olivia Choudhury ◽

...

Keyword(s):

Storage Management ◽

Scientific Workflows ◽

Data Intensive ◽

Dynamic Storage Management ◽

Dynamic Storage

Download Full-text

The PBase Scientific Workflow Provenance Repository

International Journal of Digital Curation ◽

10.2218/ijdc.v9i2.332 ◽

2014 ◽

Vol 9 (2) ◽

pp. 28-38 ◽

Cited By ~ 16

Author(s):

Víctor Cuevas-Vicenttín ◽

Parisa Kianmajd ◽

Bertram Ludäscher ◽

Paolo Missier ◽

Fernando Chirigati ◽

...

Keyword(s):

Scientific Workflow ◽

Scientific Workflows ◽

Data Reuse ◽

Data Intensive ◽

Research Collaborations ◽

Provenance Data ◽

Scientific Experiments ◽

History Of ◽

Scientific Results ◽

User Friendly

Scientific workflows and their supporting systems are becoming increasingly popular for compute-intensive and data-intensive scientific experiments. The advantages scientific workflows offer include rapid and easy workflow design, software and data reuse, scalable execution, sharing and collaboration, and other advantages that altogether facilitate “reproducible science”. In this context, provenance – information about the origin, context, derivation, ownership, or history of some artifact – plays a key role, since scientists are interested in examining and auditing the results of scientific experiments. However, in order to perform such analyses on scientific results as part of extended research collaborations, an adequate environment and tools are required. Concretely, the need arises for a repository that will facilitate the sharing of scientific workflows and their associated execution traces in an interoperable manner, also enabling querying and visualization. Furthermore, such functionality should be supported while taking performance and scalability into account. With this purpose in mind, we introduce PBase: a scientific workflow provenance repository implementing the ProvONE proposed standard, which extends the emerging W3C PROV standard for provenance data with workflow specific concepts. PBase is built on the Neo4j graph database, thus offering capabilities such as declarative and efficient querying. Our experiences demonstrate the power gained by supporting various types of queries for provenance data. In addition, PBase is equipped with a user friendly interface tailored for the visualization of scientific workflow provenance data, making the specification of queries and the interpretation of their results easier and more effective.

Download Full-text

Fault-Tolerant and Data-Intensive Resource Scheduling and Management for Scientific Applications in Cloud Computing

Sensors ◽

10.3390/s21217238 ◽

2021 ◽

Vol 21 (21) ◽

pp. 7238

Author(s):

Zulfiqar Ahmad ◽

Ali Imran Jehangiri ◽

Mohammed Alaa Ala’anzy ◽

Mohamed Othman ◽

Arif Iqbal Umar

Keyword(s):

Cloud Computing ◽

Fault Tolerant ◽

Research Work ◽

Resource Scheduling ◽

Scientific Workflow ◽

Scientific Workflows ◽

Scientific Applications ◽

Data Intensive ◽

Computing Paradigm ◽

Cost Constraints

Cloud computing is a fully fledged, matured and flexible computing paradigm that provides services to scientific and business applications in a subscription-based environment. Scientific applications such as Montage and CyberShake are organized scientific workflows with data and compute-intensive tasks and also have some special characteristics. These characteristics include the tasks of scientific workflows that are executed in terms of integration, disintegration, pipeline, and parallelism, and thus require special attention to task management and data-oriented resource scheduling and management. The tasks executed during pipeline are considered as bottleneck executions, the failure of which result in the wholly futile execution, which requires a fault-tolerant-aware execution. The tasks executed during parallelism require similar instances of cloud resources, and thus, cluster-based execution may upgrade the system performance in terms of make-span and execution cost. Therefore, this research work presents a cluster-based, fault-tolerant and data-intensive (CFD) scheduling for scientific applications in cloud environments. The CFD strategy addresses the data intensiveness of tasks of scientific workflows with cluster-based, fault-tolerant mechanisms. The Montage scientific workflow is considered as a simulation and the results of the CFD strategy were compared with three well-known heuristic scheduling policies: (a) MCT, (b) Max-min, and (c) Min-min. The simulation results showed that the CFD strategy reduced the make-span by 14.28%, 20.37%, and 11.77%, respectively, as compared with the existing three policies. Similarly, the CFD reduces the execution cost by 1.27%, 5.3%, and 2.21%, respectively, as compared with the existing three policies. In case of the CFD strategy, the SLA is not violated with regard to time and cost constraints, whereas it is violated by the existing policies numerous times.

Download Full-text