scholarly journals Partially shared cache and adaptive replacement algorithm for NoC-based many-core systems

2019 ◽  
Vol 98 ◽  
pp. 424-433 ◽  
Author(s):  
Pengfei Yang ◽  
Quan Wang ◽  
Hongwei Ye ◽  
Zhiqiang Zhang
Author(s):  
Fenglong Song ◽  
Dongrui Fan ◽  
Zhiyong Liu ◽  
Junchao Zhang ◽  
Lei Yu ◽  
...  

2011 ◽  
Vol 21 (01) ◽  
pp. 85-106 ◽  
Author(s):  
MARCO A. Z. ALVES ◽  
HENRIQUE C. FREITAS ◽  
PHILIPPE O. A. NAVAUX

Several studies point out the benefits of a shared L2 cache, but some other properties of shared caches must be considered to lead to a thorough understanding of all chip multiprocessor (CMP) bottlenecks. Our paper evaluates and explains shared cache bottlenecks, which are very important considering the rise of many-core processors. The results of our simulations with 32 cores show low performance when L2 cache memory is shared between 2 or 4 cores. In these two cases, the increase of L2 cache latency and contention are the main causes responsible for the increase of execution time.


2014 ◽  
Vol E97.C (4) ◽  
pp. 360-368
Author(s):  
Takashi MIYAMORI ◽  
Hui XU ◽  
Hiroyuki USUI ◽  
Soichiro HOSODA ◽  
Toru SANO ◽  
...  
Keyword(s):  

2010 ◽  
Vol 33 (10) ◽  
pp. 1777-1787 ◽  
Author(s):  
Wei-Zhi XU ◽  
Feng-Long SONG ◽  
Zhi-Yong LIU ◽  
Dong-Rui FAN ◽  
Lei YU ◽  
...  
Keyword(s):  

2009 ◽  
Vol 31 (11) ◽  
pp. 1918-1928 ◽  
Author(s):  
Wei LIN ◽  
Xiao-Chun YE ◽  
Feng-Long SONG ◽  
Hao ZHANG
Keyword(s):  

Impact ◽  
2019 ◽  
Vol 2019 (10) ◽  
pp. 44-46
Author(s):  
Masato Edahiro ◽  
Masaki Gondo

The pace of technology's advancements is ever-increasing and intelligent systems, such as those found in robots and vehicles, have become larger and more complex. These intelligent systems have a heterogeneous structure, comprising a mixture of modules such as artificial intelligence (AI) and powertrain control modules that facilitate large-scale numerical calculation and real-time periodic processing functions. Information technology expert Professor Masato Edahiro, from the Graduate School of Informatics at the Nagoya University in Japan, explains that concurrent advances in semiconductor research have led to the miniaturisation of semiconductors, allowing a greater number of processors to be mounted on a single chip, increasing potential processing power. 'In addition to general-purpose processors such as CPUs, a mixture of multiple types of accelerators such as GPGPU and FPGA has evolved, producing a more complex and heterogeneous computer architecture,' he says. Edahiro and his partners have been working on the eMBP, a model-based parallelizer (MBP) that offers a mapping system as an efficient way of automatically generating parallel code for multi- and many-core systems. This ensures that once the hardware description is written, eMBP can bridge the gap between software and hardware to ensure that not only is an efficient ecosystem achieved for hardware vendors, but the need for different software vendors to adapt code for their particular platforms is also eliminated.


2018 ◽  
Vol 175 ◽  
pp. 02009
Author(s):  
Carleton DeTar ◽  
Steven Gottlieb ◽  
Ruizi Li ◽  
Doug Toussaint

With recent developments in parallel supercomputing architecture, many core, multi-core, and GPU processors are now commonplace, resulting in more levels of parallelism, memory hierarchy, and programming complexity. It has been necessary to adapt the MILC code to these new processors starting with NVIDIA GPUs, and more recently, the Intel Xeon Phi processors. We report on our efforts to port and optimize our code for the Intel Knights Landing architecture. We consider performance of the MILC code with MPI and OpenMP, and optimizations with QOPQDP and QPhiX. For the latter approach, we concentrate on the staggered conjugate gradient and gauge force. We also consider performance on recent NVIDIA GPUs using the QUDA library.


Sign in / Sign up

Export Citation Format

Share Document