A Comparison of Implementation Strategies for Nonuniform Data-Parallel Computations

1998 ◽  
Vol 52 (2) ◽  
pp. 132-149 ◽  
Author(s):  
Salvatore Orlando ◽  
Raffaele Perego
2000 ◽  
Vol 11 (01) ◽  
pp. 183-204 ◽  
Author(s):  
ARNOLD L. ROSENBERG

We derive efficient guidelines for scheduling data-parallel computations within a draconian mode of cycle-stealing in networks of workstations. In this computing regimen, (the owner of) workstation A contracts with (the owner of) workstation B to take control of B's processor for a guaranteed total of U time units, punctuated by up to some prespecified number p of interrupts which kill any work A has in progress on B. On the one hand, the high overhead — of c time units — for setting up the communications that supply workstation B with work and receive its results recommends that A communicate with B infrequently, supplying B with large amounts of work each time. On the other hand, the risk of losing work in progress when workstation B is interrupted recommends that A supply B with a long sequence of small bundles of work. In this paper, we derive two sets of scheduling guidelines that balance these conflicting pressures in a way that optimizes, up to low-order additive terms, the amount of work that A is guaranteed to accomplish during the cycle-stealing opportunity. Our non-adaptive guidelines, which employ a single fixed strategy until all p interrupts have occurred, produce schedules that achieve at least [Formula: see text] units of work. Our adaptive guidelines, which change strategy after each interrupt, produce schedules that achieve at least [Formula: see text] (low-order terms) units of work. By deriving the theoretical underpinnings of our guidelines, we show that our non-adaptive schedules are optimal in guaranteed work-output and that our adaptive schedules are within low-order additive terms of being optimal.


1999 ◽  
Vol 7 (3-4) ◽  
pp. 313-326 ◽  
Author(s):  
Jan F. Prins ◽  
Siddhartha Chatterjee ◽  
Martin Simons

Modern dialects of Fortran enjoy wide use and good support on high‐performance computers as performance‐oriented programming languages. By providing the ability to express nested data parallelism, modern Fortran dialects enable irregular computations to be incorporated into existing applications with minimal rewriting and without sacrificing performance within the regular portions of the application. Since performance of nested data‐parallel computation is unpredictable and often poor using current compilers, we investigatethreadingandflattening, two source‐to‐source transformation techniques that can improve performance and performance stability. For experimental validation of these techniques, we explore nested data‐parallel implementations of the sparse matrix‐vector product and the Barnes–Hut n‐body algorithm by hand‐coding thread‐based (using OpenMP directives) and flattening‐based versions of these algorithms and evaluating their performance on an SGI Origin 2000 and an NEC SX‐4, two shared‐memory machines.


1990 ◽  
Vol 39 (2) ◽  
pp. 206-219 ◽  
Author(s):  
D.M. Nicol ◽  
P.F. Reynolds

Sign in / Sign up

Export Citation Format

Share Document