Resource and Deadline-Aware Job Scheduling in Dynamic Hadoop Clusters

Job scheduling for optimizing data locality in Hadoop clusters

Proceedings of the 20th European MPI Users' Group Meeting on - EuroMPI '13 ◽

10.1145/2488551.2488591 ◽

2013 ◽

Cited By ~ 7

Author(s):

Aprigio Bezerra ◽

Porfídio Hernández ◽

Antonio Espinosa ◽

Juan Carlos Moure

Keyword(s):

Job Scheduling ◽

Data Locality ◽

Hadoop Clusters

Download Full-text

Application and Storage-Aware Data Placement and Job Scheduling for Hadoop Clusters

Journal of Circuits System and Computers ◽

10.1142/s0218126620502540 ◽

2020 ◽

Vol 29 (16) ◽

pp. 2050254

Author(s):

Tao Li ◽

Shuibing He ◽

Ping Chen ◽

Siling Yang ◽

Yanlong Yin ◽

...

Keyword(s):

Completion Time ◽

Large Scale ◽

Job Scheduling ◽

Data Placement ◽

Storage Device ◽

Storage Devices ◽

Average Completion Time ◽

Hadoop Clusters ◽

And Storage ◽

Existing Data

As one of the most popular frameworks for large-scale analytics processing, Hadoop is facing two challenges: both applications and storage devices become heterogeneous. However, existing data placement and job scheduling schemes pay little attention to such heterogeneity of either application I/O requirements or I/O device capability, thus can greatly degrade system efficiencies. In this paper, we propose ASPS, an Application and Storage-aware data Placement and job Scheduling approach for Hadoop clusters. The idea is to place application data and schedule application tasks considering both application I/O requirements and storage device characteristics. Specifically, ASPS first introduces novel metrics to quantify I/O requirements of applications. Then, based on the quantification, ASPS places data of different applications to the preferred storage devices. Finally, ASPS tries to launch jobs with high I/O requirements on the nodes with the same type of faster devices to improve system efficiency. We have implemented ASPS in Hadoop framework. Experimental results show that ASPS can reduce the completion time of a single application by up to 36% and the average completion time of six concurrent applications by 27%, compared to existing data placement policies and job scheduling approaches.

Download Full-text

A SURVEY ON ENERGY AWARE JOB SCHEDULING ALGORITHMS IN CLOUD ENVIRONMENT

i-manager’s Journal on Cloud Computing ◽

10.26634/jcc.3.1.8077 ◽

2016 ◽

Vol 3 (1) ◽

pp. 30 ◽

Cited By ~ 1

Author(s):

NASEERA SHAIK ◽

JYOTHEESWAI P ◽

◽

Keyword(s):

Job Scheduling ◽

Scheduling Algorithms ◽

Cloud Environment ◽

Energy Aware

Download Full-text

Space sharing job scheduling policies for parallel computers

10.31274/rtd-180813-10085 ◽

1995 ◽

Author(s):

Ismail Mohamed Ismail

Keyword(s):

Job Scheduling ◽

Parallel Computers ◽

Scheduling Policies

Download Full-text

Dependable grid job scheduling mechanism

Journal of Computer Applications ◽

10.3724/sp.j.1087.2010.02066 ◽

2010 ◽

Vol 30 (8) ◽

pp. 2066-2069

Author(s):

Yong-cai TAO ◽

Lei SHI

Keyword(s):

Job Scheduling ◽

Grid Job Scheduling

Download Full-text

Multi-level Parallelization of Genotype Imputation on Supercomputers

Current Bioinformatics ◽

10.2174/1574893615999200420071307 ◽

2020 ◽

Vol 15 ◽

Author(s):

Weiwen Zhang ◽

Long Wang ◽

Theint Theint Aye ◽

Juniarto Samsudin ◽

Yongqing Zhu

Keyword(s):

Association Study ◽

Message Passing ◽

High Performance ◽

Message Passing Interface ◽

Genome Wide Association Study ◽

Job Scheduling ◽

Genotype Imputation ◽

Job Level ◽

Multi Level ◽

High Performance Requirement

Background: Genotype imputation as a service is developed to enable researchers to estimate genotypes on haplotyped data without performing whole genome sequencing. However, genotype imputation is computation intensive and thus it remains a challenge to satisfy the high performance requirement of genome wide association study (GWAS). Objective: In this paper, we propose a high performance computing solution for genotype imputation on supercomputers to enhance its execution performance. Method: We design and implement a multi-level parallelization that includes job level, process level and thread level parallelization, enabled by job scheduling management, message passing interface (MPI) and OpenMP, respectively. It involves job distribution, chunk partition and execution, parallelized iteration for imputation and data concatenation. Due to the design of multi-level parallelization, we can exploit the multi-machine/multi-core architecture to improve the performance of genotype imputation. Results: Experiment results show that our proposed method can outperform the Hadoop-based implementation of genotype imputation. Moreover, we conduct the experiments on supercomputers to evaluate the performance of the proposed method. The evaluation shows that it can significantly shorten the execution time, thus improving the performance for genotype imputation. Conclusion: The proposed multi-level parallelization, when deployed as an imputation as a service, will facilitate bioinformatics researchers in Singapore to conduct genotype imputation and enhance the association study.

Download Full-text

Power and Performance Evaluation of Memory-Intensive Applications

Energies ◽

10.3390/en14144089 ◽

2021 ◽

Vol 14 (14) ◽

pp. 4089

Author(s):

Kaiqiang Zhang ◽

Dongyang Ou ◽

Congfeng Jiang ◽

Yeliang Qiu ◽

Longchuan Yan

Keyword(s):

Energy Efficiency ◽

Energy Consumption ◽

Power Consumption ◽

Job Scheduling ◽

Memory System ◽

Processor Core ◽

Memory Efficiency ◽

And Performance ◽

Reasonable Use ◽

Server System

In terms of power and energy consumption, DRAMs play a key role in a modern server system as well as processors. Although power-aware scheduling is based on the proportion of energy between DRAM and other components, when running memory-intensive applications, the energy consumption of the whole server system will be significantly affected by the non-energy proportion of DRAM. Furthermore, modern servers usually use NUMA architecture to replace the original SMP architecture to increase its memory bandwidth. It is of great significance to study the energy efficiency of these two different memory architectures. Therefore, in order to explore the power consumption characteristics of servers under memory-intensive workload, this paper evaluates the power consumption and performance of memory-intensive applications in different generations of real rack servers. Through analysis, we find that: (1) Workload intensity and concurrent execution threads affects server power consumption, but a fully utilized memory system may not necessarily bring good energy efficiency indicators. (2) Even if the memory system is not fully utilized, the memory capacity of each processor core has a significant impact on application performance and server power consumption. (3) When running memory-intensive applications, memory utilization is not always a good indicator of server power consumption. (4) The reasonable use of the NUMA architecture will improve the memory energy efficiency significantly. The experimental results show that reasonable use of NUMA architecture can improve memory efficiency by 16% compared with SMP architecture, while unreasonable use of NUMA architecture reduces memory efficiency by 13%. The findings we present in this paper provide useful insights and guidance for system designers and data center operators to help them in energy-efficiency-aware job scheduling and energy conservation.

Download Full-text

Job scheduling for large-scale machine learning clusters

Proceedings of the 16th International Conference on emerging Networking EXperiments and Technologies ◽

10.1145/3386367.3432588 ◽

2020 ◽

Author(s):

Haoyu Wang ◽

Zetian Liu ◽

Haiying Shen

Keyword(s):

Machine Learning ◽

Large Scale ◽

Job Scheduling

Download Full-text

A Job Scheduling Algorithm for Edge Computing Based on Modified Monte Carlo Tree Search

2021 6th International Conference on Intelligent Computing and Signal Processing (ICSP) ◽

10.1109/icsp51882.2021.9408690 ◽

2021 ◽

Author(s):

Haobin Wang ◽

Chen Li ◽

Tianyuan Wang ◽

Fenfen Huang

Keyword(s):

Monte Carlo ◽

Job Scheduling ◽

Scheduling Algorithm ◽

Edge Computing ◽

Tree Search ◽

Monte Carlo Tree Search ◽

Job Scheduling Algorithm

Download Full-text

An Agent-based Adaptive Mechanism for Efficient Job Scheduling in Open and Large-scale Environments

Journal of Systems Science and Systems Engineering ◽

10.1007/s11518-021-5494-4 ◽

2021 ◽

Author(s):

Yikun Yang ◽

Fenghui Ren ◽

Minjie Zhang

Keyword(s):

Large Scale ◽

Job Scheduling ◽

Adaptive Mechanism ◽

Agent Based

Download Full-text