Closed yet Open DRAM: Achieving Low Latency and High Performance in DRAM Memory Systems

Fulcrum's FocalPoint FM4000: A Scalable, Low-Latency 10GigE Switch for High-Performance Data Centers

10.1109/hoti.2009.22 ◽

2009 ◽

Cited By ~ 8

Author(s):

Uri Cummings ◽

Dan Daly ◽

Rebecca Collins ◽

Virat Agarwal ◽

Fabrizio Petrini ◽

...

Keyword(s):

High Performance ◽

Data Centers ◽

Performance Data ◽

Low Latency

Download Full-text

Re-architecting datacenter networks and stacks for low latency and high performance

Proceedings of the Conference of the ACM Special Interest Group on Data Communication - SIGCOMM '17 ◽

10.1145/3098822.3098825 ◽

2017 ◽

Cited By ~ 67

Author(s):

Mark Handley ◽

Costin Raiciu ◽

Alexandru Agache ◽

Andrei Voinescu ◽

Andrew W. Moore ◽

...

Keyword(s):

High Performance ◽

Low Latency ◽

Datacenter Networks

Download Full-text

A SURVEY OF CHECKPOINT/RESTART TECHNIQUES ON DISTRIBUTED MEMORY SYSTEMS

Parallel Processing Letters ◽

10.1142/s0129626413400112 ◽

2013 ◽

Vol 23 (04) ◽

pp. 1340011 ◽

Cited By ~ 7

Author(s):

FAISAL SHAHZAD ◽

MARKUS WITTMANN ◽

MORITZ KREUTZER ◽

THOMAS ZEISER ◽

GEORG HAGER ◽

...

Keyword(s):

High Performance ◽

Building Blocks ◽

Memory Systems ◽

Time To Failure ◽

Flow Solver ◽

The Road ◽

System A ◽

Node Level ◽

Mean Time ◽

Performance Computing

The road to exascale computing poses many challenges for the High Performance Computing (HPC) community. Each step on the exascale path is mainly the result of a higher level of parallelism of the basic building blocks (i.e., CPUs, memory units, networking components, etc.). The reliability of each of these basic components does not increase at the same rate as the rate of hardware parallelism. This results in a reduction of the mean time to failure (MTTF) of the whole system. A fault tolerance environment is thus indispensable to run large applications on such clusters. Checkpoint/Restart (C/R) is the classic and most popular method to minimize failure damage. Its ease of implementation makes it useful, but typically it introduces significant overhead to the application. Several efforts have been made to reduce the C/R overhead. In this paper we compare various C/R techniques for their overheads by implementing them on two different categories of applications. These approaches are based on parallel-file-system (PFS)-level checkpoints (synchronous/asynchronous) and node-level checkpoints. We utilize the Scalable Checkpoint/Restart (SCR) library for the comparison of node-level checkpoints. For asynchronous PFS-level checkpoints, we use the Damaris library, the SCR asynchronous feature, and application-based checkpointing via dedicated threads. Our baseline for overhead comparison is the naïve application-based synchronous PFS-level checkpointing method. A 3D lattice-Boltzmann (LBM) flow solver and a Lanczos eigenvalue solver are used as prototypical applications in which all the techniques considered here may be applied.

Download Full-text

Design of Router Supporting Multiply Routing Algorithm for NoC

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.981.431 ◽

2014 ◽

Vol 981 ◽

pp. 431-434

Author(s):

Zhan Peng Jiang ◽

Rui Xu ◽

Chang Chun Dong ◽

Lin Hai Cui

Keyword(s):

Complex System ◽

High Performance ◽

Routing Algorithm ◽

Network On Chip ◽

System On Chip ◽

Low Latency ◽

Deterministic Routing ◽

Key Features ◽

Design Challenge ◽

On Chip

Network on Chip(NoC)，a new proposed solution to solve global communication problem in complex System on Chip (SoC) design，has absorbed more and more researchers to do research in this area. Due to some distinct characteristics, NoC is different from both traditional off-chip network and traditional on-chip bus，and is facing with the huge design challenge. NoC router design is one of the most important issues in NoC system. The paper present a high-performance, low-latency two-stage pipelined router architecture suitable for NoC designs and providing a solution to irregular 2Dmesh topology for NoC. The key features of the proposed Mix Router are its suitability for 2Dmesh NoC topology and its capability of suorting both full-adaptive routing and deterministic routing algorithm.

Download Full-text

Invasive Computing on High Performance Shared Memory Systems

Facing the Multicore-Challenge III - Lecture Notes in Computer Science ◽

10.1007/978-3-642-35893-7_1 ◽

2013 ◽

pp. 1-12 ◽

Cited By ~ 4

Author(s):

Michael Bader ◽

Hans-Joachim Bungartz ◽

Martin Schreiber

Keyword(s):

Shared Memory ◽

High Performance ◽

Memory Systems

Download Full-text

Low Latency Network-on-Chip Router Microarchitecture Using Request Masking Technique

International Journal of Reconfigurable Computing ◽

10.1155/2015/570836 ◽

2015 ◽

Vol 2015 ◽

pp. 1-13 ◽

Cited By ~ 14

Author(s):

Alireza Monemi ◽

Chia Yee Ooi ◽

Muhammad Nadzir Marsono

Keyword(s):

High Performance ◽

Clock Cycle ◽

Network On Chip ◽

Operating Frequency ◽

Low Latency ◽

Core System ◽

Low Area ◽

Area Overhead ◽

Logic Cells ◽

On Chip

Network-on-Chip (NoC) is fast emerging as an on-chip communication alternative for many-core System-on-Chips (SoCs). However, designing a high performance low latency NoC with low area overhead has remained a challenge. In this paper, we present a two-clock-cycle latency NoC microarchitecture. An efficient request masking technique is proposed to combine virtual channel (VC) allocation with switch allocation nonspeculatively. Our proposed NoC architecture is optimized in terms of area overhead, operating frequency, and quality-of-service (QoS). We evaluate our NoC against CONNECT, an open source low latency NoC design targeted for field-programmable gate array (FPGA). The experimental results on several FPGA devices show that our NoC router outperforms CONNECT with 50% reduction of logic cells (LCs) utilization, while it works with 100% and 35%~20% higher operating frequency compared to the one- and two-clock-cycle latency CONNECT NoC routers, respectively. Moreover, the proposed NoC router achieves 2.3 times better performance compared to CONNECT.

Download Full-text

Low-latency software defined network for high performance clouds

2015 10th System of Systems Engineering Conference (SoSE) ◽

10.1109/sysose.2015.7151909 ◽

2015 ◽

Cited By ~ 8

Author(s):

Paul Rad ◽

Rajendra V. Boppana ◽

Palden Lama ◽

Gilad Berman ◽

Mo Jamshidi

Keyword(s):

High Performance ◽

Low Latency ◽

Software Defined Network

Download Full-text

Command vector memory systems: high performance at low cost

Proceedings. 1998 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.98EX192) ◽

10.1109/pact.1998.727154 ◽

2002 ◽

Cited By ~ 24

Author(s):

J. Corbal ◽

R. Espasa ◽

M. Valero

Keyword(s):

High Performance ◽

Low Cost ◽

Memory Systems

Download Full-text

HAD-TWL: Hot Address Detection-Based Wear Leveling for Phase-Change Memory Systems with Low Latency

IEEE Computer Architecture Letters ◽

10.1109/lca.2019.2929393 ◽

2019 ◽

Vol 18 (2) ◽

pp. 107-110 ◽

Cited By ~ 1

Author(s):

Sunwoong Kim ◽

Hyunmin Jung ◽

Woojae Shin ◽

Hyokeun Lee ◽

Hyuk-Jae Lee

Keyword(s):

Phase Change ◽

Phase Change Memory ◽

Memory Systems ◽

Low Latency ◽

Wear Leveling ◽

Change Memory

Download Full-text

A high-performance/low-latency vector rotational CORDIC architecture based on extended elementary angle set and trellis-based searching schemes

IEEE Transactions on Circuits and Systems II Analog and Digital Signal Processing ◽

10.1109/tcsii.2003.816923 ◽

2003 ◽

Vol 50 (9) ◽

pp. 589-601 ◽

Cited By ~ 40

Author(s):

Cheng-Shing Wu ◽

An-Yeu Wu ◽

Chih-Hsiu Lin

Keyword(s):

High Performance ◽

Low Latency

Download Full-text