Techniques for Enabling Highly Efficient Message Passing on Many-Core Architectures

A Novel Hybrid Cache Coherence with Global Snooping for Many-core Architectures

ACM Transactions on Design Automation of Electronic Systems ◽

10.1145/3462775 ◽

2022 ◽

Vol 27 (1) ◽

pp. 1-31

Author(s):

Sri Harsha Gade ◽

Sujay Deb

Keyword(s):

Lower Energy ◽

Cache Coherence ◽

Network On Chip ◽

Highly Efficient ◽

Wireless Links ◽

Coherence Protocols ◽

High Area ◽

On Chip ◽

Many Core ◽

Clustered Network

Cache coherence ensures correctness of cached data in multi-core processors. Traditional implementations of existing protocols make them unscalable for many core architectures. While snoopy coherence requires unscalable ordered networks, directory coherence is weighed down by high area and energy overheads. In this work, we propose Wireless-enabled Share-aware Hybrid (WiSH) to provide scalable coherence in many core processors. WiSH implements a novel Snoopy over Directory protocol using on-chip wireless links and hierarchical, clustered Network-on-Chip to achieve low-overhead and highly efficient coherence. A local directory protocol maintains coherence within a cluster of cores, while coherence among such clusters is achieved through global snoopy protocol. The ordered network for global snooping is provided through low-latency and low-energy broadcast wireless links. The overheads are further reduced through share-aware cache segmentation to eliminate coherence for private blocks. Evaluations show that WiSH reduces traffic by and runtime by , while requiring smaller storage and lower energy as compared to existing hierarchical and hybrid coherence protocols. Owing to its modularity, WiSH provides highly efficient and scalable coherence for many core processors.

Download Full-text

Towards Highly Efficient DGEMM on the Emerging SW26010 Many-Core Processor

2017 46th International Conference on Parallel Processing (ICPP) ◽

10.1109/icpp.2017.51 ◽

2017 ◽

Cited By ~ 22

Author(s):

Lijuan Jiang ◽

Chao Yang ◽

Yulong Ao ◽

Wanwang Yin ◽

Wenjing Ma ◽

...

Keyword(s):

Highly Efficient ◽

Many Core

Download Full-text

Efficient task spawning for shared memory and message passing in many-core architectures

Journal of Systems Architecture ◽

10.1016/j.sysarc.2017.03.004 ◽

2017 ◽

Vol 77 ◽

pp. 72-82 ◽

Cited By ~ 3

Author(s):

Aurang Zaib ◽

Thomas Wild ◽

Andreas Herkersdorf ◽

Jan Heisswolf ◽

Jürgen Becker ◽

...

Keyword(s):

Shared Memory ◽

Message Passing ◽

Many Core

Download Full-text

HICFD: Highly Efficient Implementation of CFD Codes for HPC Many-Core Architectures

Competence in High Performance Computing 2010 ◽

10.1007/978-3-642-24025-6_1 ◽

2011 ◽

pp. 1-13 ◽

Cited By ~ 1

Author(s):

Achim Basermann ◽

Hans-Peter Kersken ◽

Andreas Schreiber ◽

Thomas Gerhold ◽

Jens Jägersküpper ◽

...

Keyword(s):

Efficient Implementation ◽

Highly Efficient ◽

Many Core

Download Full-text

Enabling Highly Efficient k-Means Computations on the SW26010 Many-Core Processor of Sunway TaihuLight

Journal of Computer Science and Technology ◽

10.1007/s11390-019-1900-5 ◽

2019 ◽

Vol 34 (1) ◽

pp. 77-93 ◽

Cited By ~ 4

Author(s):

Min Li ◽

Chao Yang ◽

Qiao Sun ◽

Wen-Jing Ma ◽

Wen-Long Cao ◽

...

Keyword(s):

Highly Efficient ◽

Sunway Taihulight ◽

Many Core

Download Full-text

A highly-efficient and tightly-connected many-core overlay architecture

IEEE Access ◽

10.1109/access.2021.3074171 ◽

2021 ◽

pp. 1-1

Author(s):

Riadh Ben Abdelhamid ◽

Yoshiki Yamaguchi ◽

Taisuke Boku

Keyword(s):

Highly Efficient ◽

Many Core

Download Full-text

Pronto: A Low Overhead Message Passing System for High Performance Many-Core Processors

International Journal of Networking and Computing ◽

10.15803/ijnc.4.2_307 ◽

2014 ◽

Vol 4 (2) ◽

pp. 307-320

Author(s):

Sumeet S. Kumar ◽

Mitzi Tjin-A-Djie ◽

Rene van Leuken

Keyword(s):

Message Passing ◽

High Performance ◽

Many Core

Download Full-text

Hierarchy-Aware Message-Passing in the Upcoming Many-Core Era

Grid Computing - Technology and Applications, Widespread Coverage and New Horizons ◽

10.5772/36582 ◽

2012 ◽

Author(s):

Carsten Clauss ◽

Simon Pickartz ◽

Stefan Lankes ◽

Thomas Bemmerl

Keyword(s):

Message Passing ◽

Many Core

Download Full-text

PIMP My Many-Core: Pipeline-Integrated Message Passing

International Journal of Parallel Programming ◽

10.1007/s10766-020-00685-9 ◽

2020 ◽

Author(s):

Jörg Mische ◽

Martin Frieb ◽

Alexander Stegmeier ◽

Theo Ungerer

Keyword(s):

Message Passing ◽

Direct Memory Access ◽

Communication Overhead ◽

Instruction Set ◽

Address Space ◽

Hardware Complexity ◽

Memory Accesses ◽

Hardware Costs ◽

Many Core ◽

Minimal Hardware

Abstract To improve the scalability, several many-core architectures use message passing instead of shared memory accesses for communication. Unfortunately, Direct Memory Access (DMA) transfers in a shared address space are usually used to emulate message passing, which entails a lot of overhead and thwarts the advantages of message passing. Recently proposed register-level message passing alternatives use special instructions to send the contents of a single register to another core. The reduced communication overhead and architectural simplicity lead to good many-core scalability. After investigating several other approaches in terms of hardware complexity and throughput overhead, we recommend a small instruction set extension to enable register-level message passing at minimal hardware costs and describe its integration into a classical five stage RISC-V pipeline.

Download Full-text

Reduced Complexity Many-Core: Timing Predictability Due to Message-Passing

Architecture of Computing Systems - ARCS 2017 - Lecture Notes in Computer Science ◽

10.1007/978-3-319-54999-6_11 ◽

2017 ◽

pp. 139-151 ◽

Cited By ~ 7

Author(s):

Jörg Mische ◽

Martin Frieb ◽

Alexander Stegmeier ◽

Theo Ungerer

Keyword(s):

Message Passing ◽

Reduced Complexity ◽

Many Core

Download Full-text