Compiler-directed shared-memory communication for iterative parallel applications

Using the Translation Lookaside Buffer to Map Threads in Parallel Applications Based on Shared Memory

2012 IEEE 26th International Parallel and Distributed Processing Symposium ◽

10.1109/ipdps.2012.56 ◽

2012 ◽

Cited By ~ 14

Author(s):

Eduardo H.M. Cruz ◽

Matthias Diener ◽

Philippe O.A. Navaux

Keyword(s):

Shared Memory ◽

Parallel Applications ◽

Translation Lookaside Buffer

Download Full-text

Impact of Loop Granularity and Self-Preemption on the Performance of Loop Parallel Applications on a Multiprogrammed Shared-Memory Multiprocessor

10.1109/icpp.1994.117 ◽

1994 ◽

Cited By ~ 3

Author(s):

Chitra Natarajan ◽

Sanjay Sharma ◽

Ravishankar Iyer

Keyword(s):

Shared Memory ◽

Parallel Applications ◽

Shared Memory Multiprocessor

Download Full-text

Performance analysis of multilevel parallel applications on shared memory architectures

Proceedings International Parallel and Distributed Processing Symposium ◽

10.1109/ipdps.2003.1213183 ◽

2004 ◽

Cited By ~ 12

Author(s):

G. Jost ◽

Haoqiang Jin ◽

J. Labarta ◽

J. Gimenez ◽

J. Caubet

Keyword(s):

Performance Analysis ◽

Shared Memory ◽

Parallel Applications ◽

Memory Architectures ◽

Shared Memory Architectures

Download Full-text

Performance Analysis of Shared-Memory Parallel Applications Using Performance Properties

High Performance Computing and Communications - Lecture Notes in Computer Science ◽

10.1007/11557654_70 ◽

2005 ◽

pp. 595-604 ◽

Cited By ~ 3

Author(s):

Karl Fürlinger ◽

Michael Gerndt

Keyword(s):

Performance Analysis ◽

Shared Memory ◽

Parallel Applications ◽

Performance Properties

Download Full-text

Software Distributed Shared Memory with Transactional Coherence - A Software Engine to Run Transactional Shared-memory Parallel Applications on Clusters

2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing ◽

10.1109/pdp.2010.28 ◽

2010 ◽

Cited By ~ 2

Author(s):

Michele Di Santo ◽

Nadia Ranaldo ◽

Carmine Sementa ◽

Eugenio Zimeo

Keyword(s):

Shared Memory ◽

Distributed Shared Memory ◽

Parallel Applications ◽

Software Distributed Shared Memory

Download Full-text

Performance Behavior Prediction Scheme for Shared-Memory Parallel Applications

2011 IEEE International Conference on Cluster Computing ◽

10.1109/cluster.2011.58 ◽

2011 ◽

Author(s):

John Corredor ◽

Juan Carlos Moure ◽

Dolores Rexachs ◽

Daniel Franco ◽

Emilio Luque

Keyword(s):

Shared Memory ◽

Parallel Applications ◽

Behavior Prediction ◽

Prediction Scheme ◽

Performance Behavior

Download Full-text

Detecting phases in parallel applications on shared memory architectures

Proceedings 20th IEEE International Parallel & Distributed Processing Symposium ◽

10.1109/ipdps.2006.1639325 ◽

2006 ◽

Cited By ~ 28

Author(s):

E. Perelman ◽

M. Polito ◽

J.-Y. Bouguet ◽

J. Sampson ◽

B. Calder ◽

...

Keyword(s):

Shared Memory ◽

Parallel Applications ◽

Memory Architectures ◽

Shared Memory Architectures

Download Full-text

On the coexistence of shared-memory and message-passing in the programming of parallel applications

High-Performance Computing and Networking - Lecture Notes in Computer Science ◽

10.1007/bfb0031643 ◽

1997 ◽

pp. 718-727 ◽

Cited By ~ 4

Author(s):

J. Cordsen ◽

W. Schröder-Preikschat

Keyword(s):

Shared Memory ◽

Message Passing ◽

Parallel Applications

Download Full-text

SPar: A DSL for High-Level and Productive Stream Parallelism

Parallel Processing Letters ◽

10.1142/s0129626417400059 ◽

2017 ◽

Vol 27 (01) ◽

pp. 1740005 ◽

Cited By ~ 20

Author(s):

Dalvan Griebler ◽

Marco Danelutto ◽

Massimo Torquati ◽

Luiz Gustavo Fernandes

Keyword(s):

Shared Memory ◽

Stream Processing ◽

Parallel Applications ◽

Domain Specific Language ◽

Specific Language ◽

Domain Specific ◽

Implementation Techniques ◽

High Level

This paper introduces SPar, an internal C++ Domain-Specific Language (DSL) that supports the development of classic stream parallel applications. The DSL uses standard C++ attributes to introduce annotations tagging the notable components of stream parallel applications: stream sources and stream processing stages. A set of tools process SPar code (C++ annotated code using the SPar attributes) to generate FastFlow C++ code that exploits the stream parallelism denoted by SPar annotations while targeting shared memory multi-core architectures. We outline the main SPar features along with the main implementation techniques and tools. Also, we show the results of experiments assessing the feasibility of the entire approach as well as SPar’s performance and expressiveness.

Download Full-text

ISOLATING COSTS IN SHARED MEMORY COMMUNICATION BUFFERING

Parallel Processing Letters ◽

10.1142/s0129626405002271 ◽

2005 ◽

Vol 15 (04) ◽

pp. 357-365

Author(s):

SURENDRA BYNA ◽

KIRK W. CAMERON ◽

XIAN-HE SUN

Keyword(s):

Shared Memory ◽

Data Transfer ◽

Memory Systems ◽

Communication Cost ◽

Parallel Applications ◽

Network Interface ◽

Software Performance ◽

Data Transfers ◽

The Cost ◽

The Impact

Communication in parallel applications is a combination of data transfers internally at a source or destination and across the network. Previous research focused on quantifying network transfer costs has indirectly resulted in reduced overall communication cost. Optimized data transfer from source memory to the network interface has received less attention. In shared memory systems, such memory-to-memory transfers dominate communication cost. In distributed memory systems, memory-to-network interface transfers grow in significance as processor and network speeds increase at faster rates than memory latency speeds. Our objective is to minimize the cost of internal data transfers. The following examples illustrating the impact of memory transfers on communication, we present a methodology for classifying the effects of data size and data distribution on hardware, middleware, and application software performance. This cost is quantified using hardware counter event measurements on the SGI Origin 2000. For the SGI O2K, we empirically identify the cost caused by just copying data from one buffer to another and the middleware overhead. We use MPICH in our experiments, but our techniques are generally applicable to any communication implementation.

Download Full-text