Techniques for Enabling Highly Efficient Message Passing on Many-Core Architectures

Author(s):  
Min Si ◽  
Pavan Balaji ◽  
Yutaka Ishikawa
2022 ◽  
Vol 27 (1) ◽  
pp. 1-31
Author(s):  
Sri Harsha Gade ◽  
Sujay Deb

Cache coherence ensures correctness of cached data in multi-core processors. Traditional implementations of existing protocols make them unscalable for many core architectures. While snoopy coherence requires unscalable ordered networks, directory coherence is weighed down by high area and energy overheads. In this work, we propose Wireless-enabled Share-aware Hybrid (WiSH) to provide scalable coherence in many core processors. WiSH implements a novel Snoopy over Directory protocol using on-chip wireless links and hierarchical, clustered Network-on-Chip to achieve low-overhead and highly efficient coherence. A local directory protocol maintains coherence within a cluster of cores, while coherence among such clusters is achieved through global snoopy protocol. The ordered network for global snooping is provided through low-latency and low-energy broadcast wireless links. The overheads are further reduced through share-aware cache segmentation to eliminate coherence for private blocks. Evaluations show that WiSH reduces traffic by and runtime by , while requiring smaller storage and lower energy as compared to existing hierarchical and hybrid coherence protocols. Owing to its modularity, WiSH provides highly efficient and scalable coherence for many core processors.


Author(s):  
Lijuan Jiang ◽  
Chao Yang ◽  
Yulong Ao ◽  
Wanwang Yin ◽  
Wenjing Ma ◽  
...  
Keyword(s):  

2017 ◽  
Vol 77 ◽  
pp. 72-82 ◽  
Author(s):  
Aurang Zaib ◽  
Thomas Wild ◽  
Andreas Herkersdorf ◽  
Jan Heisswolf ◽  
Jürgen Becker ◽  
...  

Author(s):  
Achim Basermann ◽  
Hans-Peter Kersken ◽  
Andreas Schreiber ◽  
Thomas Gerhold ◽  
Jens Jägersküpper ◽  
...  

2019 ◽  
Vol 34 (1) ◽  
pp. 77-93 ◽  
Author(s):  
Min Li ◽  
Chao Yang ◽  
Qiao Sun ◽  
Wen-Jing Ma ◽  
Wen-Long Cao ◽  
...  

IEEE Access ◽  
2021 ◽  
pp. 1-1
Author(s):  
Riadh Ben Abdelhamid ◽  
Yoshiki Yamaguchi ◽  
Taisuke Boku
Keyword(s):  

2014 ◽  
Vol 4 (2) ◽  
pp. 307-320
Author(s):  
Sumeet S. Kumar ◽  
Mitzi Tjin-A-Djie ◽  
Rene van Leuken

Author(s):  
Carsten Clauss ◽  
Simon Pickartz ◽  
Stefan Lankes ◽  
Thomas Bemmerl
Keyword(s):  

Author(s):  
Jörg Mische ◽  
Martin Frieb ◽  
Alexander Stegmeier ◽  
Theo Ungerer

Abstract To improve the scalability, several many-core architectures use message passing instead of shared memory accesses for communication. Unfortunately, Direct Memory Access (DMA) transfers in a shared address space are usually used to emulate message passing, which entails a lot of overhead and thwarts the advantages of message passing. Recently proposed register-level message passing alternatives use special instructions to send the contents of a single register to another core. The reduced communication overhead and architectural simplicity lead to good many-core scalability. After investigating several other approaches in terms of hardware complexity and throughput overhead, we recommend a small instruction set extension to enable register-level message passing at minimal hardware costs and describe its integration into a classical five stage RISC-V pipeline.


Sign in / Sign up

Export Citation Format

Share Document