Spill-free parallel scheduling of basic blocks

Author(s):  
B. Natarajan ◽  
M. Schlansker
Author(s):  
S. Blom ◽  
S. Darabi ◽  
M. Huisman ◽  
M. Safari

AbstractA commonly used approach to develop deterministic parallel programs is to augment a sequential program with compiler directives that indicate which program blocks may potentially be executed in parallel. This paper develops a verification technique to reason about such compiler directives, in particular to show that they do not change the behaviour of the program. Moreover, the verification technique is tool-supported and can be combined with proving functional correctness of the program. To develop our verification technique, we propose a simple intermediate representation (syntax and semantics) that captures the main forms of deterministic parallel programs. This language distinguishes three kinds of basic blocks: parallel, vectorised and sequential blocks, which can be composed using three different composition operators: sequential, parallel and fusion composition. We show how a widely used subset of OpenMP can be encoded into this intermediate representation. Our verification technique builds on the notion of iteration contract to specify the behaviour of basic blocks; we show that if iteration contracts are manually specified for single blocks, then that is sufficient to automatically reason about data race freedom of the composed program. Moreover, we also show that it is sufficient to establish functional correctness on a linearised version of the original program to conclude functional correctness of the parallel program. Finally, we exemplify our approach on an example OpenMP program, and we discuss how tool support is provided.


Author(s):  
Guoqi Xie ◽  
Xiongren Xiao ◽  
Hao Peng ◽  
Renfa Li ◽  
Keqin Li

2014 ◽  
Vol 519-520 ◽  
pp. 108-113 ◽  
Author(s):  
Jun Chen ◽  
Bo Li ◽  
Er Fei Wang

This paper studies resource reservation mechanisms in the strict parallel computing grid,and proposed to support the parallel strict resource reservation request scheduling model and algorithms, FCFS and EASY backfill analysis of two important parallel scheduling algorithm, given four parallel scheduling algorithms supporting resource reservation. Simulation results of four algorithms of resource utilization, job bounded slowdown factor and the success rate of Advanced Reservation (AR) jobs were studied. The results show that the EASY backfill + firstfit algorithm can ensure QoS of AR jobs while taking into account the performance of good non-AR jobs.


1983 ◽  
Vol 31 (1) ◽  
pp. 24-49 ◽  
Author(s):  
Eliezer Dekel ◽  
Sartaj Sahni

IEEE Access ◽  
2021 ◽  
Vol 9 ◽  
pp. 20493-20507
Author(s):  
Dowhan Jeong ◽  
Jangryul Kim ◽  
Mari-Liis Oldja ◽  
Soonhoi Ha

Author(s):  
Yuanjun Laili ◽  
Fuqiang Guo ◽  
Lei Ren ◽  
Xiang Li ◽  
Yulin Li ◽  
...  

2021 ◽  
Vol 14 (4) ◽  
pp. 1-15
Author(s):  
Zhenghua Gu ◽  
Wenqing Wan ◽  
Jundong Xie ◽  
Chang Wu

Performance optimization is an important goal for High-level Synthesis (HLS). Existing HLS scheduling algorithms are all based on Control and Data Flow Graph (CDFG) and will schedule basic blocks in sequential order. Our study shows that the sequential scheduling order of basic blocks is a big limiting factor for achievable circuit performance. In this article, we propose a Dependency Graph (DG) with two important properties for scheduling. First, DG is a directed acyclic graph. Thus, no loop breaking heuristic is needed for scheduling. Second, DG can be used to identify the exact instruction parallelism. Our experiment shows that DG can lead to 76% instruction parallelism increase over CDFG. Based on DG, we propose a bottom-up scheduling algorithm to achieve much higher instruction parallelism than existing algorithms. Hierarchical state transition graph with guard conditions is proposed for efficient implementation of such high parallelism scheduling. Our experimental results show that our DG-based HLS algorithm can outperform the CDFG-based LegUp and the state-of-the-art industrial tool Vivado HLS by 2.88× and 1.29× on circuit latency, respectively.


Sign in / Sign up

Export Citation Format

Share Document