Distributed prefetch-buffer/cache design for high performance memory systems

A SURVEY OF CHECKPOINT/RESTART TECHNIQUES ON DISTRIBUTED MEMORY SYSTEMS

Parallel Processing Letters ◽

10.1142/s0129626413400112 ◽

2013 ◽

Vol 23 (04) ◽

pp. 1340011 ◽

Cited By ~ 7

Author(s):

FAISAL SHAHZAD ◽

MARKUS WITTMANN ◽

MORITZ KREUTZER ◽

THOMAS ZEISER ◽

GEORG HAGER ◽

...

Keyword(s):

High Performance ◽

Building Blocks ◽

Memory Systems ◽

Time To Failure ◽

Flow Solver ◽

The Road ◽

System A ◽

Node Level ◽

Mean Time ◽

Performance Computing

The road to exascale computing poses many challenges for the High Performance Computing (HPC) community. Each step on the exascale path is mainly the result of a higher level of parallelism of the basic building blocks (i.e., CPUs, memory units, networking components, etc.). The reliability of each of these basic components does not increase at the same rate as the rate of hardware parallelism. This results in a reduction of the mean time to failure (MTTF) of the whole system. A fault tolerance environment is thus indispensable to run large applications on such clusters. Checkpoint/Restart (C/R) is the classic and most popular method to minimize failure damage. Its ease of implementation makes it useful, but typically it introduces significant overhead to the application. Several efforts have been made to reduce the C/R overhead. In this paper we compare various C/R techniques for their overheads by implementing them on two different categories of applications. These approaches are based on parallel-file-system (PFS)-level checkpoints (synchronous/asynchronous) and node-level checkpoints. We utilize the Scalable Checkpoint/Restart (SCR) library for the comparison of node-level checkpoints. For asynchronous PFS-level checkpoints, we use the Damaris library, the SCR asynchronous feature, and application-based checkpointing via dedicated threads. Our baseline for overhead comparison is the naïve application-based synchronous PFS-level checkpointing method. A 3D lattice-Boltzmann (LBM) flow solver and a Lanczos eigenvalue solver are used as prototypical applications in which all the techniques considered here may be applied.

Download Full-text

Invasive Computing on High Performance Shared Memory Systems

Facing the Multicore-Challenge III - Lecture Notes in Computer Science ◽

10.1007/978-3-642-35893-7_1 ◽

2013 ◽

pp. 1-12 ◽

Cited By ~ 4

Author(s):

Michael Bader ◽

Hans-Joachim Bungartz ◽

Martin Schreiber

Keyword(s):

Shared Memory ◽

High Performance ◽

Memory Systems

Download Full-text

Command vector memory systems: high performance at low cost

Proceedings. 1998 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.98EX192) ◽

10.1109/pact.1998.727154 ◽

2002 ◽

Cited By ~ 24

Author(s):

J. Corbal ◽

R. Espasa ◽

M. Valero

Keyword(s):

High Performance ◽

Low Cost ◽

Memory Systems

Download Full-text

Closed yet Open DRAM: Achieving Low Latency and High Performance in DRAM Memory Systems

2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC) ◽

10.1109/dac.2018.8465817 ◽

2018 ◽

Author(s):

Lavanya Subramanian ◽

Kaushik Vaidyanathan ◽

Anant Nori ◽

Sreenivas Subramoney ◽

Tanay Karnik ◽

...

Keyword(s):

High Performance ◽

Memory Systems ◽

Low Latency

Download Full-text

A New Buffer Cache Design Exploiting Both Temporal and Content Localities

2010 IEEE 30th International Conference on Distributed Computing Systems ◽

10.1109/icdcs.2010.26 ◽

2010 ◽

Cited By ~ 13

Author(s):

Jin Ren ◽

Qing Yang

Keyword(s):

Buffer Cache ◽

Cache Design

Download Full-text

A dual-ported variable-way L1 D-cache design for high performance embedded DSP

2008 9th International Conference on Solid-State and Integrated-Circuit Technology ◽

10.1109/icsict.2008.4734919 ◽

2008 ◽

Author(s):

Di Jia ◽

Hu He ◽

Yihe Sun

Keyword(s):

High Performance ◽

Embedded Dsp ◽

Cache Design

Download Full-text

Using DRAM as Cache for Non-Volatile Main Memory Swapping

International Journal of Software Innovation ◽

10.4018/ijsi.2016010105 ◽

2016 ◽

Vol 4 (1) ◽

pp. 61-71

Author(s):

Hirotaka Kawata ◽

Gaku Nakagawa ◽

Shuichi Oikawa

Keyword(s):

Power Consumption ◽

Power Management ◽

Mobile Devices ◽

Memory Management ◽

High Performance ◽

Reducing Power ◽

Memory Systems ◽

Main Memory ◽

Dynamic Power Management ◽

Memory Space

The performance of mobile devices such as smartphones and tablets has been rapidly improving in recent years. However, these improvements have been seriously affecting power consumption. One of the greatest challenges is to achieve efficient power management for battery-equipped mobile devices. To solve this problem, the authors focus on the emerging non-volatile memory (NVM), which has been receiving increasing attention in recent years. Since its performance is comparable with that of DRAM, it is possible to replace the main memory with NVM, thereby reducing power consumption. However, the price and capacity of NVM are problematic. Therefore, the authors provide a large memory space without performance degradation by combining NVM with other memory devices. In this study, they propose a design for non-volatile main memory systems that use DRAM as a swap space. This enables both high performance and energy efficient memory management through dynamic power management in NVM and DRAM.

Download Full-text

PGHPF – An Optimizing High Performance Fortran Compiler for Distributed Memory Machines

Scientific Programming ◽

10.1155/1997/705102 ◽

1997 ◽

Vol 6 (1) ◽

pp. 29-40 ◽

Cited By ~ 9

Author(s):

Zeki Bozkus ◽

Larry Meadows ◽

Steven Nakamoto ◽

Vincent Schuster ◽

Mark Young

Keyword(s):

High Performance ◽

Distributed Memory ◽

Parallel Machines ◽

High Efficiency ◽

Memory Systems ◽

Production Quality ◽

Distributed Memory Machines ◽

High Performance Fortran ◽

Application Developers ◽

Efficient Software

High Performance Fortran (HPF) is the first widely supported, efficient, and portable parallel programming language for shared and distributed memory systems. HPF is realized through a set of directive-based extensions to Fortran 90. It enables application developers and Fortran end-users to write compact, portable, and efficient software that will compile and execute on workstations, shared memory servers, clusters, traditional supercomputers, or massively parallel processors. This article describes a production-quality HPF compiler for a set of parallel machines. Compilation techniques such as data and computation distribution, communication generation, run-time support, and optimization issues are elaborated as the basis for an HPF compiler implementation on distributed memory machines. The performance of this compiler on benchmark programs demonstrates that high efficiency can be achieved executing HPF code on parallel architectures.

Download Full-text