Resource Provisioning Options for Large-Scale Scientific Workflows

Beyond HEP: Photon and accelerator science computing infrastructure at DESY

EPJ Web of Conferences ◽

10.1051/epjconf/202024507036 ◽

2020 ◽

Vol 245 ◽

pp. 07036

Author(s):

Christoph Beyer ◽

Stefan Bujack ◽

Stefan Dietrich ◽

Thomas Finnern ◽

Martin Flemming ◽

...

Keyword(s):

High Performance ◽

Large Scale ◽

High Energy Physics ◽

High Energy ◽

Resource Provisioning ◽

Small Scale ◽

Online Processing ◽

Offline Processing ◽

National Analysis ◽

Energy Physics

DESY is one of the largest accelerator laboratories in Europe. It develops and operates state of the art accelerators for fundamental science in the areas of high energy physics, photon science and accelerator development. While for decades high energy physics (HEP) has been the most prominent user of the DESY compute, storage and network infrastructure, various scientific areas as science with photons and accelerator development have caught up and are now dominating the demands on the DESY infrastructure resources, with significant consequences for the IT resource provisioning. In this contribution, we will present an overview of the computational, storage and network resources covering the various physics communities on site. Ranging from high-throughput computing (HTC) batch-like offline processing in the Grid and the interactive user analyses resources in the National Analysis Factory (NAF) for the HEP community, to the computing needs of accelerator development or of photon sciences such as PETRA III or the European XFEL. Since DESY is involved in these experiments and their data taking, their requirements include fast low-latency online processing for data taking and calibration as well as offline processing, thus high-performance computing (HPC) workloads, that are run on the dedicated Maxwell HPC cluster. As all communities face significant challenges due to changing environments and increasing data rates in the following years, we will discuss how this will reflect in necessary changes to the computing and storage infrastructures. We will present DESY compute cloud and container orchestration plans as a basis for infrastructure and platform services. We will show examples of Jupyter notebooks for small scale interactive analysis, as well as its integration into large scale resources such as batch systems or Spark clusters. To overcome the fragmentation of the various resources for all scientific communities at DESY, we explore how to integrate them into a seamless user experience in an Interdisciplinary Data Analysis Facility.

Download Full-text

Data Management in Scientific Workflows

Advances in Systems Analysis, Software Engineering, and High Performance Computing - Data Intensive Distributed Computing ◽

10.4018/978-1-61520-971-2.ch008 ◽

2012 ◽

pp. 177-187

Author(s):

Ewa Deelman ◽

Ann Chervenak

Keyword(s):

Data Management ◽

Gravitational Wave ◽

Large Scale ◽

Real Life ◽

Scientific Workflows ◽

Workflow Systems ◽

Automated Generation ◽

Workflow Execution ◽

Data Products ◽

Workflow Planning

Scientific applications such as those in astronomy, earthquake science, gravitational-wave physics, and others have embraced workflow technologies to do large-scale science. Workflows enable researchers to collaboratively design, manage, and obtain results that involve hundreds of thousands of steps, access terabytes of data, and generate similar amounts of intermediate and final data products. Although workflow systems are able to facilitate the automated generation of data products, many issues still remain to be addressed. These issues exist in different forms in the workflow lifecycle. This chapter describes a workflow lifecycle as consisting of a workflow generation phase where the analysis is defined, the workflow planning phase where resources needed for execution are selected, the workflow execution part, where the actual computations take place, and the result, metadata, and provenance storing phase. The authors discuss the issues related to data management at each step of the workflow cycle. They describe challenge problems and illustrate them in the context of real-life applications. They discuss the challenges, possible solutions, and open issues faced when mapping and executing large-scale workflows on current cyberinfrastructure. They particularly emphasize the issues related to the management of data throughout the workflow lifecycle.

Download Full-text

Resource Provisioning Analysis in Virtual Machines Using Statistical Performance of Workloads for Business Applications

Journal of Computational and Theoretical Nanoscience ◽

10.1166/jctn.2020.9037 ◽

2020 ◽

Vol 17 (9) ◽

pp. 4156-4161

Author(s):

Jeny Varghese ◽

S. Jagannatha

Keyword(s):

Cloud Computing ◽

Large Scale ◽

Virtual Machines ◽

Resource Provisioning ◽

Business Applications ◽

Cloud Federation ◽

Cloud Datacenter

Cloud Federation is the interconnection of two or more cloud computing settings in order to share configurable processing components such as networks, servers, apps that can be dynamically delivered to customers. Virtualization has been an integral part of cloud computing which provides manageability and utilization of resources. This paper analyses on how jobs of business applications demand and efficiently use the capacity of the resources that are provisioned by the VMs, thereby managing the performance of the applications. The in-depth assessment is based on two large-scale and constant performance traces gathered in a cloud datacenter that host company tools for running distinct apps with regard to requested and used resources.

Download Full-text

A Declarative Optimization Engine for Resource Provisioning of Scientific Workflows in Geo-Distributed Clouds

IEEE Transactions on Parallel and Distributed Systems ◽

10.1109/tpds.2016.2599529 ◽

2017 ◽

Vol 28 (3) ◽

pp. 647-661 ◽

Cited By ~ 9

Author(s):

Amelie Chi Zhou ◽

Bingsheng He ◽

Xuntao Cheng ◽

Chiew Tong Lau

Keyword(s):

Resource Provisioning ◽

Scientific Workflows

Download Full-text

Defining Similarity Spaces for Large-Scale Image Retrieval Through Scientific Workflows

Proceedings of the 21st International Database Engineering & Applications Symposium on - IDEAS 2017 ◽

10.1145/3105831.3105863 ◽

2017 ◽

Cited By ~ 1

Author(s):

Luis Fernando Milano Oliveira ◽

Daniel dos Santos Kaster

Keyword(s):

Image Retrieval ◽

Large Scale ◽

Scientific Workflows ◽

Large Scale Image Retrieval

Download Full-text

Electric power resource provisioning for large scale public EV charging facilities

2013 IEEE International Conference on Smart Grid Communications (SmartGridComm) ◽

10.1109/smartgridcomm.2013.6687946 ◽

2013 ◽

Cited By ~ 12

Author(s):

I. Safak Bayram ◽

George Michailidis ◽

Michael Devetsikiotis

Keyword(s):

Electric Power ◽

Large Scale ◽

Resource Provisioning ◽

Power Resource ◽

Ev Charging

Download Full-text

Online Task Resource Consumption Prediction for Scientific Workflows

Parallel Processing Letters ◽

10.1142/s0129626415410030 ◽

2015 ◽

Vol 25 (03) ◽

pp. 1541003 ◽

Cited By ~ 23

Author(s):

Rafael Ferreira da Silva ◽

Gideon Juve ◽

Mats Rynge ◽

Ewa Deelman ◽

Miron Livny

Keyword(s):

Input Data ◽

Resource Provisioning ◽

Scientific Workflows ◽

Memory Consumption ◽

Online Estimation ◽

Workflow Execution ◽

Disk Space ◽

Estimation Process ◽

Fine Grained ◽

Task Requirements

Estimates of task runtime, disk space usage, and memory consumption, are commonly used by scheduling and resource provisioning algorithms to support efficient and reliable workflow executions. Such algorithms often assume that accurate estimates are available, but such estimates are difficult to generate in practice. In this work, we first profile five real scientific workflows, collecting fine-grained information such as process I/O, runtime, memory usage, and CPU utilization. We then propose a method to automatically characterize workflow task requirements based on these profiles. Our method estimates task runtime, disk space, and peak memory consumption based on the size of the tasks’ input data. It looks for correlations between the parameters of a dataset, and if no correlation is found, the dataset is divided into smaller subsets using a clustering technique. Task estimates are generated based on the ratio parameter/input data size if they are correlated, or based on the probability distribution function of the parameter. We then propose an online estimation process based on the MAPE-K loop, where task executions are monitored and estimates are updated as more information becomes available. Experimental results show that our online estimation process results in much more accurate predictions than an offline approach, where all task requirements are estimated prior to workflow execution.

Download Full-text

Aeromancer: A Workflow Manager for Large-Scale MapReduce-Based Scientific Workflows

2014 IEEE 13th International Conference on Trust, Security and Privacy in Computing and Communications ◽

10.1109/trustcom.2014.97 ◽

2014 ◽

Cited By ~ 1

Author(s):

Nabeel Mohamed ◽

Nabanita Maji ◽

Jing Zhang ◽

Nataliya Timoshevskaya ◽

Wu-Chun Feng

Keyword(s):

Large Scale ◽

Scientific Workflows ◽

Workflow Manager

Download Full-text

Scheduling and Resource Provisioning Algorithms for ScientificWorkflows on Commercial Clouds

10.26686/wgtn.17071976 ◽

2021 ◽

Author(s):

◽

Vahid Arabnejad

Keyword(s):

High Performance ◽

Large Scale ◽

Cost Model ◽

Universal Access ◽

Scientific Workflow ◽

Resource Provisioning ◽

Scheduling Problem ◽

Computing Paradigm ◽

Computationally Intensive ◽

Workflow Tasks

<p>Basic science is becoming ever more computationally intensive, increasing the need for large-scale compute and storage resources, be they within a High-Performance Computer cluster, or more recently, within the cloud. Commercial clouds have increasingly become a viable platform for hosting scientific analyses and computation due to their elasticity, recent introduction of specialist hardware, and pay-as-you-go cost model. This computing paradigm therefore presents a low capital and low barrier alternative to operating dedicated eScience infrastructure. Indeed, commercial clouds now enable universal access to capabilities previously available to only large well funded research groups. While the potential benefits of cloud computing are clear, there are still significant technical hurdles associated with obtaining the best execution efficiency whilst trading off cost. In most cases, large scale scientific computation is represented as a workflow for scheduling and runtime provisioning. Such scheduling becomes an even more challenging problem on cloud systems due to the dynamic nature of the cloud, in particular, the elasticity, the pricing models (both static and dynamic), the non-homogeneous resource types and the vast array of services. This mapping of workflow tasks onto a set of provisioned instances is an example of the general scheduling problem and is NP-complete. In addition, certain runtime constraints, the most typical being the cost of the computation and the time which that computation requires to complete, must be met. This thesis addresses 'the scientific workflow scheduling problem in cloud', which is to schedule workflow tasks on cloud resources in a way that users meet their defined constraints such as budget and deadline, and providers maximize profits and resource utilization. Moreover, it explores different mechanisms and strategies for distributing defined constraints over a workflow and investigate its impact on the overall cost of the resulting schedule.</p>

Download Full-text

An Exotic IWD - SVR Based Approach for Failure Prognostication in Cloud-Based Scientific Workflows

10.21203/rs.3.rs-716843/v1 ◽

2021 ◽

Author(s):

Sridevi S ◽

Jeevaa Katiravan Jeevaa Katiravan

Keyword(s):

Large Scale ◽

Performance Metrics ◽

Prediction Models ◽

Fault Tolerant ◽

Scientific Workflow ◽

Scientific Workflows ◽

Support Vector ◽

Learning Approaches ◽

Task Failure ◽

Proactive Measures

Abstract Scientific workflows deserve the emerging attention in sophisticated large-scale scientific problem-solving environments. Though a single task failure occurs in workflow based applications, due to its task dependency nature the reliability of the overall system will be affected drastically. Hence rather than reactive fault tolerant approaches, proactive measures are vital in scientific workflows. This work puts forth an attempt to concentrate on the exploration issue of structuring an Exotic Intelligent Water Drops - Support Vector Regression-based approach for task failure prognostication which facilitates proactive fault tolerance in scientific workflow applications. The failure prediction models in this study have been implemented through SVR-based machine learning approaches and its precision accuracy is optimized by IWDA and various performance metrics were evaluated. The experimental results prove that the proposed approach performs better compared with the other existing techniques.

Download Full-text