Compiler-directed shared-memory communication for iterative parallel applications

Author(s):  
Guhan Viswanathan ◽  
James R. Larus
2017 ◽  
Vol 27 (01) ◽  
pp. 1740005 ◽  
Author(s):  
Dalvan Griebler ◽  
Marco Danelutto ◽  
Massimo Torquati ◽  
Luiz Gustavo Fernandes

This paper introduces SPar, an internal C++ Domain-Specific Language (DSL) that supports the development of classic stream parallel applications. The DSL uses standard C++ attributes to introduce annotations tagging the notable components of stream parallel applications: stream sources and stream processing stages. A set of tools process SPar code (C++ annotated code using the SPar attributes) to generate FastFlow C++ code that exploits the stream parallelism denoted by SPar annotations while targeting shared memory multi-core architectures. We outline the main SPar features along with the main implementation techniques and tools. Also, we show the results of experiments assessing the feasibility of the entire approach as well as SPar’s performance and expressiveness.


2005 ◽  
Vol 15 (04) ◽  
pp. 357-365
Author(s):  
SURENDRA BYNA ◽  
KIRK W. CAMERON ◽  
XIAN-HE SUN

Communication in parallel applications is a combination of data transfers internally at a source or destination and across the network. Previous research focused on quantifying network transfer costs has indirectly resulted in reduced overall communication cost. Optimized data transfer from source memory to the network interface has received less attention. In shared memory systems, such memory-to-memory transfers dominate communication cost. In distributed memory systems, memory-to-network interface transfers grow in significance as processor and network speeds increase at faster rates than memory latency speeds. Our objective is to minimize the cost of internal data transfers. The following examples illustrating the impact of memory transfers on communication, we present a methodology for classifying the effects of data size and data distribution on hardware, middleware, and application software performance. This cost is quantified using hardware counter event measurements on the SGI Origin 2000. For the SGI O2K, we empirically identify the cost caused by just copying data from one buffer to another and the middleware overhead. We use MPICH in our experiments, but our techniques are generally applicable to any communication implementation.


Sign in / Sign up

Export Citation Format

Share Document