An incremental reinforcement learning scheduling strategy for data‐intensive scientific workflows in the cloud

Author(s):  
André Nascimento ◽  
Vítor Silva ◽  
Aline Paes ◽  
Daniel Oliveira
2013 ◽  
Vol 12 (2) ◽  
pp. 245-264 ◽  
Author(s):  
Claudia Szabo ◽  
Quan Z. Sheng ◽  
Trent Kroeger ◽  
Yihong Zhang ◽  
Jian Yu

2018 ◽  
Vol 29 (2) ◽  
pp. 338-350 ◽  
Author(s):  
Nicholas Hazekamp ◽  
Nathaniel Kremer-Herman ◽  
Benjamin Tovar ◽  
Haiyan Meng ◽  
Olivia Choudhury ◽  
...  

2014 ◽  
Vol 9 (2) ◽  
pp. 28-38 ◽  
Author(s):  
Víctor Cuevas-Vicenttín ◽  
Parisa Kianmajd ◽  
Bertram Ludäscher ◽  
Paolo Missier ◽  
Fernando Chirigati ◽  
...  

Scientific workflows and their supporting systems are becoming increasingly popular for compute-intensive and data-intensive scientific experiments. The advantages scientific workflows offer include rapid and easy workflow design, software and data reuse, scalable execution, sharing and collaboration, and other advantages that altogether facilitate “reproducible science”. In this context, provenance – information about the origin, context, derivation, ownership, or history of some artifact – plays a key role, since scientists are interested in examining and auditing the results of scientific experiments. However, in order to perform such analyses on scientific results as part of extended research collaborations, an adequate environment and tools are required. Concretely, the need arises for a repository that will facilitate the sharing of scientific workflows and their associated execution traces in an interoperable manner, also enabling querying and visualization. Furthermore, such functionality should be supported while taking performance and scalability into account. With this purpose in mind, we introduce PBase: a scientific workflow provenance repository implementing the ProvONE proposed standard, which extends the emerging W3C PROV standard for provenance data with workflow specific concepts. PBase is built on the Neo4j graph database, thus offering capabilities such as declarative and efficient querying. Our experiences demonstrate the power gained by supporting various types of queries for provenance data. In addition, PBase is equipped with a user friendly interface tailored for the visualization of scientific workflow provenance data, making the specification of queries and the interpretation of their results easier and more effective.


Sensors ◽  
2021 ◽  
Vol 21 (21) ◽  
pp. 7238
Author(s):  
Zulfiqar Ahmad ◽  
Ali Imran Jehangiri ◽  
Mohammed Alaa Ala’anzy ◽  
Mohamed Othman ◽  
Arif Iqbal Umar

Cloud computing is a fully fledged, matured and flexible computing paradigm that provides services to scientific and business applications in a subscription-based environment. Scientific applications such as Montage and CyberShake are organized scientific workflows with data and compute-intensive tasks and also have some special characteristics. These characteristics include the tasks of scientific workflows that are executed in terms of integration, disintegration, pipeline, and parallelism, and thus require special attention to task management and data-oriented resource scheduling and management. The tasks executed during pipeline are considered as bottleneck executions, the failure of which result in the wholly futile execution, which requires a fault-tolerant-aware execution. The tasks executed during parallelism require similar instances of cloud resources, and thus, cluster-based execution may upgrade the system performance in terms of make-span and execution cost. Therefore, this research work presents a cluster-based, fault-tolerant and data-intensive (CFD) scheduling for scientific applications in cloud environments. The CFD strategy addresses the data intensiveness of tasks of scientific workflows with cluster-based, fault-tolerant mechanisms. The Montage scientific workflow is considered as a simulation and the results of the CFD strategy were compared with three well-known heuristic scheduling policies: (a) MCT, (b) Max-min, and (c) Min-min. The simulation results showed that the CFD strategy reduced the make-span by 14.28%, 20.37%, and 11.77%, respectively, as compared with the existing three policies. Similarly, the CFD reduces the execution cost by 1.27%, 5.3%, and 2.21%, respectively, as compared with the existing three policies. In case of the CFD strategy, the SLA is not violated with regard to time and cost constraints, whereas it is violated by the existing policies numerous times.


Sign in / Sign up

Export Citation Format

Share Document