Closed yet Open DRAM: Achieving Low Latency and High Performance in DRAM Memory Systems

Author(s):  
Lavanya Subramanian ◽  
Kaushik Vaidyanathan ◽  
Anant Nori ◽  
Sreenivas Subramoney ◽  
Tanay Karnik ◽  
...  
2009 ◽  
Author(s):  
Uri Cummings ◽  
Dan Daly ◽  
Rebecca Collins ◽  
Virat Agarwal ◽  
Fabrizio Petrini ◽  
...  

2013 ◽  
Vol 23 (04) ◽  
pp. 1340011 ◽  
Author(s):  
FAISAL SHAHZAD ◽  
MARKUS WITTMANN ◽  
MORITZ KREUTZER ◽  
THOMAS ZEISER ◽  
GEORG HAGER ◽  
...  

The road to exascale computing poses many challenges for the High Performance Computing (HPC) community. Each step on the exascale path is mainly the result of a higher level of parallelism of the basic building blocks (i.e., CPUs, memory units, networking components, etc.). The reliability of each of these basic components does not increase at the same rate as the rate of hardware parallelism. This results in a reduction of the mean time to failure (MTTF) of the whole system. A fault tolerance environment is thus indispensable to run large applications on such clusters. Checkpoint/Restart (C/R) is the classic and most popular method to minimize failure damage. Its ease of implementation makes it useful, but typically it introduces significant overhead to the application. Several efforts have been made to reduce the C/R overhead. In this paper we compare various C/R techniques for their overheads by implementing them on two different categories of applications. These approaches are based on parallel-file-system (PFS)-level checkpoints (synchronous/asynchronous) and node-level checkpoints. We utilize the Scalable Checkpoint/Restart (SCR) library for the comparison of node-level checkpoints. For asynchronous PFS-level checkpoints, we use the Damaris library, the SCR asynchronous feature, and application-based checkpointing via dedicated threads. Our baseline for overhead comparison is the naïve application-based synchronous PFS-level checkpointing method. A 3D lattice-Boltzmann (LBM) flow solver and a Lanczos eigenvalue solver are used as prototypical applications in which all the techniques considered here may be applied.


2014 ◽  
Vol 981 ◽  
pp. 431-434
Author(s):  
Zhan Peng Jiang ◽  
Rui Xu ◽  
Chang Chun Dong ◽  
Lin Hai Cui

Network on Chip(NoC),a new proposed solution to solve global communication problem in complex System on Chip (SoC) design,has absorbed more and more researchers to do research in this area. Due to some distinct characteristics, NoC is different from both traditional off-chip network and traditional on-chip bus,and is facing with the huge design challenge. NoC router design is one of the most important issues in NoC system. The paper present a high-performance, low-latency two-stage pipelined router architecture suitable for NoC designs and providing a solution to irregular 2Dmesh topology for NoC. The key features of the proposed Mix Router are its suitability for 2Dmesh NoC topology and its capability of suorting both full-adaptive routing and deterministic routing algorithm.


2015 ◽  
Vol 2015 ◽  
pp. 1-13 ◽  
Author(s):  
Alireza Monemi ◽  
Chia Yee Ooi ◽  
Muhammad Nadzir Marsono

Network-on-Chip (NoC) is fast emerging as an on-chip communication alternative for many-core System-on-Chips (SoCs). However, designing a high performance low latency NoC with low area overhead has remained a challenge. In this paper, we present a two-clock-cycle latency NoC microarchitecture. An efficient request masking technique is proposed to combine virtual channel (VC) allocation with switch allocation nonspeculatively. Our proposed NoC architecture is optimized in terms of area overhead, operating frequency, and quality-of-service (QoS). We evaluate our NoC against CONNECT, an open source low latency NoC design targeted for field-programmable gate array (FPGA). The experimental results on several FPGA devices show that our NoC router outperforms CONNECT with 50% reduction of logic cells (LCs) utilization, while it works with 100% and 35%~20% higher operating frequency compared to the one- and two-clock-cycle latency CONNECT NoC routers, respectively. Moreover, the proposed NoC router achieves 2.3 times better performance compared to CONNECT.


2019 ◽  
Vol 18 (2) ◽  
pp. 107-110 ◽  
Author(s):  
Sunwoong Kim ◽  
Hyunmin Jung ◽  
Woojae Shin ◽  
Hyokeun Lee ◽  
Hyuk-Jae Lee

Sign in / Sign up

Export Citation Format

Share Document