An architecture for high-performance scalable shared-memory multiprocessors exploiting on-chip integration

M.E. Acacio; J. Gonzalez; J.M. Garcia; J. Duato

doi:10.1109/tpds.2004.27

An architecture for high-performance scalable shared-memory multiprocessors exploiting on-chip integration

IEEE Transactions on Parallel and Distributed Systems ◽

10.1109/tpds.2004.27 ◽

2004 ◽

Vol 15 (8) ◽

pp. 755-768 ◽

Cited By ~ 14

Author(s):

M.E. Acacio ◽

J. Gonzalez ◽

J.M. Garcia ◽

J. Duato

Keyword(s):

Shared Memory ◽

High Performance ◽

Shared Memory Multiprocessors ◽

On Chip

Download Full-text

User-controllable coherence for high performance shared memory multiprocessors

Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming - PPoPP '03 ◽

10.1145/781498.781507 ◽

2003 ◽

Author(s):

Collin McCurdy ◽

Charles Fischer

Keyword(s):

Shared Memory ◽

High Performance ◽

Shared Memory Multiprocessors

Download Full-text

A virtual-physical on-chip cache for shared memory multiprocessors

Euro-Par'97 Parallel Processing - Lecture Notes in Computer Science ◽

10.1007/bfb0002814 ◽

1997 ◽

pp. 789-792

Author(s):

Dongwook Kim ◽

Joonwon Lee

Keyword(s):

Shared Memory ◽

Shared Memory Multiprocessors ◽

On Chip

Download Full-text

User-controllable coherence for high performance shared memory multiprocessors

ACM SIGPLAN Notices ◽

10.1145/966049.781507 ◽

2003 ◽

Vol 38 (10) ◽

pp. 73-82

Author(s):

Collin McCurdy ◽

Charles Fischer

Keyword(s):

Shared Memory ◽

High Performance ◽

Shared Memory Multiprocessors

Download Full-text

ABOUT CACHE ASSOCIATIVITY IN LOW-COST SHARED MEMORY MULTI-MICROPROCESSORS

Parallel Processing Letters ◽

10.1142/s0129626495000436 ◽

1995 ◽

Vol 05 (03) ◽

pp. 475-487

Author(s):

N. DRACH ◽

A. GEFFLAUT ◽

P. JOUBERT ◽

A. SEZNEC

Keyword(s):

Shared Memory ◽

Low Cost ◽

The Other ◽

Shared Memory Multiprocessors ◽

Other Hand ◽

Cache Associativity ◽

Cache Organization ◽

On Chip

Sizes of on-chip caches on current commercial microprocessors range from 16 Kbytes to 36 Kbytes. These microprocessors can be directly used in the design of a low cost single-bus shared memory multiprocessors without using any second-level cache. In this paper, we explore the viability of such a multi-microprocessor. Simulations results clearly establish that performance of such a system will be quite poor if on-chip caches are direct-mapped. On the other hand, when the on-chip caches are partially associative, the achieved level of performance is quite promising. In particular, two recently proposed innovative cache structures, the skewed-associative cache organization and the semi-unified cache organization are shown to work fine.

Download Full-text

Smart Memory and Network-On-Chip Design for High-Performance Shared-Memory Chip Multiprocessors

10.4995/thesis/10251/35325 ◽

2014 ◽

Author(s):

Mario Lodde

Keyword(s):

Shared Memory ◽

High Performance ◽

Chip Multiprocessors ◽

Network On Chip ◽

Chip Design ◽

Memory Chip ◽

On Chip

Download Full-text

OpenMP: parallel programming API for shared memory multiprocessors and on-chip multiprocessors

15th International Symposium on System Synthesis, 2002. ◽

10.1109/isss.2002.1227161 ◽

2003 ◽

Cited By ~ 3

Author(s):

M. Sato

Keyword(s):

Parallel Programming ◽

Shared Memory ◽

Chip Multiprocessors ◽

Shared Memory Multiprocessors ◽

On Chip

Download Full-text

Achieving high performance in bus-based shared-memory multiprocessors

IEEE Concurrency ◽

10.1109/4434.865891 ◽

2000 ◽

Vol 8 (3) ◽

pp. 36-44 ◽

Cited By ~ 3

Author(s):

A. Milenkovic

Keyword(s):

Shared Memory ◽

High Performance ◽

Shared Memory Multiprocessors

Download Full-text

Reducing the latency of L2 misses in shared-memory multiprocessors through on-chip directory integration

Proceedings 10th Euromicro Workshop on Parallel, Distributed and Network-based Processing ◽

10.1109/empdp.2002.994312 ◽

2003 ◽

Author(s):

M.E. Acacio ◽

J. Gonzalez ◽

J.M. Garcia ◽

J. Duato

Keyword(s):

Shared Memory ◽

Shared Memory Multiprocessors ◽

On Chip

Download Full-text

Efficient Instruction and Data Caching for High Performance Embedded Processors

Jornada de Jóvenes Investigadores del I3A ◽

10.26754/jji-i3a.201201788 ◽

1970 ◽

pp. 9

Author(s):

A. Ferrerón Labari ◽

D. Suárez Gracia ◽

V. Viñals Yúfera

Keyword(s):

Embedded Systems ◽

Power Consumption ◽

Low Power ◽

Interconnection Networks ◽

High Performance ◽

Critical Issue ◽

Content Management ◽

Structure Design ◽

Portable Devices ◽

On Chip

In the last years, embedded systems have evolved so that they offer capabilities we could only find before in high performance systems. Portable devices already have multiprocessors on-chip (such as PowerPC 476FP or ARM Cortex A9 MP), usually multi-threaded, and a powerful multi-level cache memory hierarchy on-chip. As most of these systems are battery-powered, the power consumption becomes a critical issue. Achieving high performance and low power consumption is a high complexity challenge where some proposals have been already made. Suarez et al. proposed a new cache hierarchy on-chip, the LP-NUCA (Low Power NUCA), which is able to reduce the access latency taking advantage of NUCA (Non-Uniform Cache Architectures) properties. The key points are decoupling the functionality, and utilizing three specialized networks on-chip. This structure has been proved to be efficient for data hierarchies, achieving a good performance and reducing the energy consumption. On the other hand, instruction caches have different requirements and characteristics than data caches, contradicting the low-power embedded systems requirements, especially in SMT (simultaneous multi-threading) environments. We want to study the benefits of utilizing small tiled caches for the instruction hierarchy, so we propose a new design, ID-LP-NUCAs. Thus, we need to re-evaluate completely our previous design in terms of structure design, interconnection networks (including topologies, flow control and routing), content management (with special interest in hardware/software content allocation policies), and structure sharing. In CMP environments (chip multiprocessors) with parallel workloads, coherence plays an important role, and must be taken into consideration.

Download Full-text