iteration space
Recently Published Documents


TOTAL DOCUMENTS

40
(FIVE YEARS 1)

H-INDEX

7
(FIVE YEARS 0)

2020 ◽  
Vol 4 (OOPSLA) ◽  
pp. 1-30 ◽  
Author(s):  
Ryan Senanayake ◽  
Changwan Hong ◽  
Ziheng Wang ◽  
Amalee Wilson ◽  
Stephen Chou ◽  
...  

Author(s):  
N. A. Likhoded ◽  
M. A. Paliashchuk

The algorithm implemented on a parallel computer with distributed memory has, as a rule, a tiled structure: a set of operations is divided into subsets, called tiles. One of the modern approaches to obtaining tiled versions of algorithms is a tiling transformation based on information sections of the iteration space, resulting in macro-operations (tiles). The operations of one tile are performed atomically, as one unit of calculation, and the data exchange is done by arrays. The method of construction of tiled computational processes logically organized as a two-dimensional structure for algorithms given by multidimensional loops is stated. Compared to one-dimensional structures, the use of two-dimensional structures is possible in a smaller number of cases, but it can have advantages when implementing algorithms on parallel computers with distributed memory. Among the possible advantages are the reduction of the volume of communication operations, the reduction of acceleration and deceleration of computations, potentially a greater number of computation processes and the organization of data exchange operations only within the rows or columns of processes. The results are a generalization of some aspects of the method of construction of parallel computational processes organized in a one-dimensional structure to the case of a two-dimensional structure. It is shown that under certain restrictions on the structure and length of loops, it is sufficient to perform tiling on three coordinates of a multidimensional iteration space. In the earlier theoretical studies, the parallelism of tiled computations was guaranteed in the presence of information sections in all coordinates of the iteration space, and for a simpler case of a one-dimensional structure, in two coordinates.


2017 ◽  
Author(s):  
Pedro Henrique Penna ◽  
Márcio Castro ◽  
Patricia Plentz ◽  
Henrique C. Freitas ◽  
François Broquedis ◽  
...  

Workload-aware loop schedulers were introduced to deliver better performance than classical strategies, but they present limitations on workload estimation, chunk scheduling and integrability with applications. Targeting these challenges, in this work we propose a novel workload-aware loop scheduler that is called BinLPT and it is based on three features. First, it relies on some user-supplied estimation of the workload of the target parallel loop. Second, BinLPT uses a greedy bin packing heuristic to adaptively partition the iteration space in several chunks. The maximum number of chunks to be produced is a parameter that may be fine-tuned. Third, it schedules chunks of iterations using a hybrid scheme based on the LPT rule and on-demand scheduling. We integrated BinLPT in OpenMP, and we evaluated its performance in a large-scale NUMA machine using a synthetic kernel and 3D N-Body Simulations. Our results revealed that BinLPT improves performance over OpenMP’s strategies by up to 45.13% and 37.15% in the synthetic and application kernels, respectively.


Author(s):  
Aniket Shivam ◽  
Alexandru Nicolau ◽  
Alexander V. Veidenbaum ◽  
Mario Mango Furnari ◽  
Rosario Cammarota

2016 ◽  
Vol 51 (1) ◽  
pp. 539-554 ◽  
Author(s):  
Wenlei Bao ◽  
Sriram Krishnamoorthy ◽  
Louis-Noël Pouchet ◽  
Fabrice Rastello ◽  
P. Sadayappan

Sign in / Sign up

Export Citation Format

Share Document