A Comparison of Implementation Strategies for Nonuniform Data-Parallel Computations

Salvatore Orlando; Raffaele Perego

doi:10.1006/jpdc.1998.1456

A Comparison of Implementation Strategies for Nonuniform Data-Parallel Computations

Journal of Parallel and Distributed Computing ◽

10.1006/jpdc.1998.1456 ◽

1998 ◽

Vol 52 (2) ◽

pp. 132-149 ◽

Cited By ~ 1

Author(s):

Salvatore Orlando ◽

Raffaele Perego

Keyword(s):

Implementation Strategies ◽

Parallel Computations ◽

Data Parallel

Download Full-text

Scheduling data-parallel computations on heterogeneous and time-shared environments

Euro-Par’98 Parallel Processing - Lecture Notes in Computer Science ◽

10.1007/bfb0057874 ◽

1998 ◽

pp. 356-366 ◽

Cited By ~ 2

Author(s):

Salvatore Orlando ◽

Raffaele Perego

Keyword(s):

Parallel Computations ◽

Data Parallel ◽

Shared Environments

Download Full-text

GUIDELINES FOR DATA-PARALLEL CYCLE-STEALING IN NETWORKS OF WORKSTATIONS II: ON MAXIMIZING GUARANTEED OUTPUT

International Journal of Foundations of Computer Science ◽

10.1142/s0129054100000107 ◽

2000 ◽

Vol 11 (01) ◽

pp. 183-204 ◽

Cited By ~ 8

Author(s):

ARNOLD L. ROSENBERG

Keyword(s):

Parallel Computations ◽

The Other ◽

Work Output ◽

Change Strategy ◽

Work In Progress ◽

Data Parallel ◽

Other Hand ◽

The One ◽

Cycle Stealing ◽

Low Order

We derive efficient guidelines for scheduling data-parallel computations within a draconian mode of cycle-stealing in networks of workstations. In this computing regimen, (the owner of) workstation A contracts with (the owner of) workstation B to take control of B's processor for a guaranteed total of U time units, punctuated by up to some prespecified number p of interrupts which kill any work A has in progress on B. On the one hand, the high overhead — of c time units — for setting up the communications that supply workstation B with work and receive its results recommends that A communicate with B infrequently, supplying B with large amounts of work each time. On the other hand, the risk of losing work in progress when workstation B is interrupted recommends that A supply B with a long sequence of small bundles of work. In this paper, we derive two sets of scheduling guidelines that balance these conflicting pressures in a way that optimizes, up to low-order additive terms, the amount of work that A is guaranteed to accomplish during the cycle-stealing opportunity. Our non-adaptive guidelines, which employ a single fixed strategy until all p interrupts have occurred, produce schedules that achieve at least [Formula: see text] units of work. Our adaptive guidelines, which change strategy after each interrupt, produce schedules that achieve at least [Formula: see text] (low-order terms) units of work. By deriving the theoretical underpinnings of our guidelines, we show that our non-adaptive schedules are optimal in guaranteed work-output and that our adaptive schedules are within low-order additive terms of being optimal.

Download Full-text

Decentralized remapping of data parallel computations with the generalized dimension exchange method

Proceedings of IEEE Scalable High Performance Computing Conference ◽

10.1109/shpcc.1994.296673 ◽

2002 ◽

Cited By ~ 1

Author(s):

Cheng-Zhong Xu ◽

F.C.M. Lau

Keyword(s):

Parallel Computations ◽

Exchange Method ◽

Data Parallel ◽

Generalized Dimension

Download Full-text

Speculative Slot Reservation: Enforcing Service Isolation for Dependent Data-Parallel Computations

2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS) ◽

10.1109/icdcs.2017.174 ◽

2017 ◽

Cited By ~ 1

Author(s):

Chen Chen ◽

Wei Wang ◽

Bo Li

Keyword(s):

Parallel Computations ◽

Dependent Data ◽

Data Parallel

Download Full-text

Irregular Computations in Fortran – Expression and Implementation Strategies

Scientific Programming ◽

10.1155/1999/607659 ◽

1999 ◽

Vol 7 (3-4) ◽

pp. 313-326 ◽

Cited By ~ 1

Author(s):

Jan F. Prins ◽

Siddhartha Chatterjee ◽

Martin Simons

Keyword(s):

Programming Languages ◽

High Performance ◽

Sparse Matrix ◽

Implementation Strategies ◽

Data Parallel ◽

Nested Data ◽

And Performance ◽

High Performance Computers ◽

Matrix Vector ◽

Irregular Computations

Modern dialects of Fortran enjoy wide use and good support on high‐performance computers as performance‐oriented programming languages. By providing the ability to express nested data parallelism, modern Fortran dialects enable irregular computations to be incorporated into existing applications with minimal rewriting and without sacrificing performance within the regular portions of the application. Since performance of nested data‐parallel computation is unpredictable and often poor using current compilers, we investigatethreadingandflattening, two source‐to‐source transformation techniques that can improve performance and performance stability. For experimental validation of these techniques, we explore nested data‐parallel implementations of the sparse matrix‐vector product and the Barnes–Hut n‐body algorithm by hand‐coding thread‐based (using OpenMP directives) and flattening‐based versions of these algorithms and evaluating their performance on an SGI Origin 2000 and an NEC SX‐4, two shared‐memory machines.

Download Full-text