CHAMELEON: Reactive Load Balancing for Hybrid MPI+OpenMP Task-Parallel Applications

Scheduling Task-parallel Applications in Dynamically Asymmetric Environments

49th International Conference on Parallel Processing - ICPP : Workshops ◽

10.1145/3409390.3409408 ◽

2020 ◽

Author(s):

Jing Chen ◽

Pirah Noor Soomro ◽

Mustafa Abduljabbar ◽

Madhavan Manivannan ◽

Miquel Pericas

Keyword(s):

Parallel Applications ◽

Task Parallel

Download Full-text

Characterizing and Mitigating Work Time Inflation in Task Parallel Programs

Scientific Programming ◽

10.1155/2013/898597 ◽

2013 ◽

Vol 21 (3-4) ◽

pp. 123-136 ◽

Cited By ~ 1

Author(s):

Stephen L. Olivier ◽

Bronis R. de Supinski ◽

Martin Schulz ◽

Jan F. Prins

Keyword(s):

Poor Performance ◽

Data Access ◽

Parallel Applications ◽

Task Parallelism ◽

Improve Performance ◽

Additional Time ◽

Work Time ◽

Task Parallel ◽

Time Required ◽

Data Access Latency

Task parallelism raises the level of abstraction in shared memory parallel programming to simplify the development of complex applications. However, task parallel applications can exhibit poor performance due to thread idleness, scheduling overheads, andwork time inflation– additional time spent by threads in a multithreaded computation beyond the time required to perform the same work in a sequential computation. We identify the contributions of each factor to lost efficiency in various task parallel OpenMP applications and diagnose the causes of work time inflation in those applications. Increased data access latency can cause significant work time inflation in NUMA systems. Our locality framework for task parallel OpenMP programs mitigates this cause of work time inflation. Our extensions to the Qthreads library demonstrate that locality-aware scheduling can improve performance up to 3X compared to the Intel OpenMP task scheduler.

Download Full-text

Periodic hierarchical load balancing for large supercomputers

The International Journal of High Performance Computing Applications ◽

10.1177/1094342010394383 ◽

2011 ◽

Vol 25 (4) ◽

pp. 371-385 ◽

Cited By ~ 34

Author(s):

Gengbin Zheng ◽

Abhinav Bhatelé ◽

Esteban Meneses ◽

Laxmikant V. Kalé

Keyword(s):

Load Balancing ◽

Large Scale ◽

Parallel Machines ◽

National Laboratory ◽

Argonne National Laboratory ◽

Parallel Applications ◽

Scientific Application ◽

Computing Center ◽

Blue Gene ◽

Advanced Computing

Large parallel machines with hundreds of thousands of processors are becoming more prevalent. Ensuring good load balance is critical for scaling certain classes of parallel applications on even thousands of processors. Centralized load balancing algorithms suffer from scalability problems, especially on machines with a relatively small amount of memory. Fully distributed load balancing algorithms, on the other hand, tend to take longer to arrive at good solutions. In this paper, we present an automatic dynamic hierarchical load balancing method that overcomes the scalability challenges of centralized schemes and longer running times of traditional distributed schemes. Our solution overcomes these issues by creating multiple levels of load balancing domains which form a tree. This hierarchical method is demonstrated within a measurement-based load balancing framework in Charm++. We discuss techniques to deal with scalability challenges of load balancing at very large scale. We present performance data of the hierarchical load balancing method on up to 16,384 cores of Ranger (at the Texas Advanced Computing Center) and 65,536 cores of Intrepid (the Blue Gene/P at Argonne National Laboratory) for a synthetic benchmark. We also demonstrate the successful deployment of the method in a scientific application, NAMD, with results on Intrepid.

Download Full-text

Extreme-scale scripting: Opportunities for large task-parallel applications on petascale computers

Journal of Physics Conference Series ◽

10.1088/1742-6596/180/1/012046 ◽

2009 ◽

Vol 180 ◽

pp. 012046 ◽

Cited By ~ 13

Author(s):

Michael Wilde ◽

Ioan Raicu ◽

Allan Espinosa ◽

Zhao Zhang ◽

Ben Clifford ◽

...

Keyword(s):

Parallel Applications ◽

Task Parallel ◽

Extreme Scale

Download Full-text

Formalizing Data Locality in Task Parallel Applications

Algorithms and Architectures for Parallel Processing - Lecture Notes in Computer Science ◽

10.1007/978-3-319-49956-7_4 ◽

2016 ◽

pp. 43-61 ◽

Cited By ~ 3

Author(s):

Germán Ceballos ◽

Erik Hagersten ◽

David Black-Schaffer

Keyword(s):

Data Locality ◽

Parallel Applications ◽

Task Parallel

Download Full-text

Asymmetry-aware load balancing for parallel applications in single-ISA multi-core systems

Journal of Zhejiang University SCIENCE C ◽

10.1631/jzus.c1100198 ◽

2012 ◽

Vol 13 (6) ◽

pp. 413-427 ◽

Cited By ~ 1

Author(s):

Eunsung Kim ◽

Hyeonsang Eom ◽

Heon Y. Yeom

Keyword(s):

Load Balancing ◽

Parallel Applications

Download Full-text

Interactive visualization of cross-layer performance anomalies in dynamic task-parallel applications and systems

2016 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) ◽

10.1109/ispass.2016.7482102 ◽

2016 ◽

Cited By ~ 5

Author(s):

Andi Drebes ◽

Antoniu Pop ◽

Karine Heydemann ◽

Albert Cohen

Keyword(s):

Interactive Visualization ◽

Parallel Applications ◽

Cross Layer ◽

Dynamic Task ◽

Task Parallel

Download Full-text

Branch and Bound Based Load Balancing for Parallel Applications

Computing in Object-Oriented Parallel Environments - Lecture Notes in Computer Science ◽

10.1007/10704054_20 ◽

1999 ◽

pp. 194-199 ◽

Cited By ~ 2

Author(s):

Shobana Radhakrishnan ◽

Robert K. Brunner ◽

Laxmikant V. Kalé

Keyword(s):

Load Balancing ◽

Branch And Bound ◽

Parallel Applications

Download Full-text

Resource-Aware Load Balancing of Parallel Applications

Handbook of Research on Grid Technologies and Utility Computing ◽

10.4018/978-1-60566-184-1.ch002 ◽

2009 ◽

pp. 12-21 ◽

Cited By ~ 5

Author(s):

Eric Aubanel

Keyword(s):

Load Balancing ◽

Dynamic Network ◽

Computational Grid ◽

Parallel Applications ◽

Computational Grids ◽

Concurrent Execution ◽

Processor Performance ◽

Wide Range ◽

Tightly Coupled ◽

Resource Aware

The problem of load balancing parallel applications is particularly challenging on computational grids, since the characteristics of both the application and the platform must be taken into account. This chapter reviews the wide range of solutions that have been proposed. It considers tightly coupled parallel applications that can be described by an undirected graph representing concurrent execution of tasks and communication of tasks, executing on computational grids with static and dynamic network and processor performance. While a rich set of solution techniques have been proposed, there has not been of yet any performance comparisons between them. Such comparisons will require parallel benchmarks and computational grid emulators and simulators.

Download Full-text

Dynamic Load Balancing in Adaptive Parallel Applications

Software for Parallel Computation ◽

10.1007/978-3-642-58049-9_24 ◽

1993 ◽

pp. 333-347

Author(s):

Wouter Joosen ◽

Pierre Verbaeten

Keyword(s):

Load Balancing ◽

Dynamic Load ◽

Dynamic Load Balancing ◽

Parallel Applications

Download Full-text