DGX-A100 Face to Face DGX-2—Performance, Power and Thermal Behavior Evaluation

Matej Špeťko; Ondřej Vysocký; Branislav Jansík; Lubomír Říha

doi:10.3390/en14020376

DGX-A100 Face to Face DGX-2—Performance, Power and Thermal Behavior Evaluation

Energies ◽

10.3390/en14020376 ◽

2021 ◽

Vol 14 (2) ◽

pp. 376

Author(s):

Matej Špeťko ◽

Ondřej Vysocký ◽

Branislav Jansík ◽

Lubomír Říha

Keyword(s):

Artificial Intelligence ◽

Thermal Behavior ◽

Energy Efficient ◽

High Performance ◽

Floating Point ◽

Double Precision ◽

Face To Face ◽

Scientific Simulations ◽

Performance Computing ◽

Dynamic Frequency

Nvidia is a leading producer of GPUs for high-performance computing and artificial intelligence, bringing top performance and energy-efficiency. We present performance, power consumption, and thermal behavior analysis of the new Nvidia DGX-A100 server equipped with eight A100 Ampere microarchitecture GPUs. The results are compared against the previous generation of the server, Nvidia DGX-2, based on Tesla V100 GPUs. We developed a synthetic benchmark to measure the raw performance of floating-point computing units including Tensor Cores. Furthermore, thermal stability was investigated. In addition, Dynamic Frequency and Voltage Scaling (DVFS) analysis was performed to determine the best energy-efficient configuration of the GPUs executing workloads of various arithmetical intensities. Under the energy-optimal configuration the A100 GPU reaches efficiency of 51 GFLOPS/W for double-precision workload and 91 GFLOPS/W for tensor core double precision workload, which makes the A100 the most energy-efficient server accelerator for scientific simulations in the market.

Download Full-text

High performance and energy efficient single‐precision and double‐precision merged floating‐point adder on FPGA

IET Computers & Digital Techniques ◽

10.1049/iet-cdt.2016.0200 ◽

2017 ◽

Vol 12 (1) ◽

pp. 20-29 ◽

Cited By ~ 4

Author(s):

Hao Zhang ◽

Dongdong Chen ◽

Seok‐Bum Ko

Keyword(s):

Energy Efficient ◽

High Performance ◽

Floating Point ◽

Double Precision ◽

Single Precision

Download Full-text

Mixed-precision iterative refinement using tensor cores on GPUs to accelerate solution of linear systems

Proceedings of The Royal Society A Mathematical Physical and Engineering Sciences ◽

10.1098/rspa.2020.0110 ◽

2020 ◽

Vol 476 (2243) ◽

pp. 20200110

Author(s):

Azzam Haidar ◽

Harun Bayraktar ◽

Stanimire Tomov ◽

Jack Dongarra ◽

Nicholas J. Higham

Keyword(s):

Linear Systems ◽

Numerical Stability ◽

High Performance ◽

Iterative Refinement ◽

Floating Point ◽

Double Precision ◽

Floating Point Arithmetic ◽

Mixed Precision ◽

Scientific Simulations ◽

Point Arithmetic

Double-precision floating-point arithmetic (FP64) has been the de facto standard for engineering and scientific simulations for several decades. Problem complexity and the sheer volume of data coming from various instruments and sensors motivate researchers to mix and match various approaches to optimize compute resources, including different levels of floating-point precision. In recent years, machine learning has motivated hardware support for half-precision floating-point arithmetic. A primary challenge in high-performance computing is to leverage reduced-precision and mixed-precision hardware. We show how the FP16/FP32 Tensor Cores on NVIDIA GPUs can be exploited to accelerate the solution of linear systems of equations Ax = b without sacrificing numerical stability. The techniques we employ include multiprecision LU factorization, the preconditioned generalized minimal residual algorithm (GMRES), and scaling and auto-adaptive rounding to avoid overflow. We also show how to efficiently handle systems with multiple right-hand sides. On the NVIDIA Quadro GV100 (Volta) GPU, we achieve a 4 × − 5 × performance increase and 5× better energy efficiency versus the standard FP64 implementation while maintaining an FP64 level of numerical stability.

Download Full-text

A Novel Rounding Algorithm for a High Performance IEEE 754 Double-Precision Floating-Point Multiplier

2020 IEEE 38th International Conference on Computer Design (ICCD) ◽

10.1109/iccd50377.2020.00081 ◽

2020 ◽

Author(s):

S. Ross Thompson ◽

James E. Stine

Keyword(s):

High Performance ◽

Floating Point ◽

Double Precision ◽

Rounding Algorithm

Download Full-text

A Novel Energy Efficient Scheduling for High Performance Computing Systems

2018 9th International Conference on Computing, Communication and Networking Technologies (ICCCNT) ◽

10.1109/icccnt.2018.8494120 ◽

2018 ◽

Cited By ~ 4

Author(s):

Tarun Biswas ◽

Pratyay Kuila ◽

Anjan Kumar Ray

Keyword(s):

High Performance Computing ◽

Energy Efficient ◽

High Performance ◽

Computing Systems ◽

Performance Computing ◽

Energy Efficient Scheduling

Download Full-text

A floating-point accumulator for FPGA-based high performance computing applications

2009 International Conference on Field-Programmable Technology ◽

10.1109/fpt.2009.5377624 ◽

2009 ◽

Cited By ~ 14

Author(s):

Song Sun ◽

Joseph Zambreno

Keyword(s):

High Performance Computing ◽

High Performance ◽

Floating Point ◽

Performance Computing

Download Full-text

EE HPC SOP 2020 Energy Efficient High Performance Computing State of the Practice Workshop : Welcome Message

2020 IEEE International Conference on Cluster Computing (CLUSTER) ◽

10.1109/cluster49012.2020.00008 ◽

2020 ◽

Keyword(s):

High Performance Computing ◽

Energy Efficient ◽

High Performance ◽

Performance Computing

Download Full-text

Cloud Computing for Scientific Simulation and High Performance Computing

Principles, Methodologies, and Service-Oriented Approaches for Cloud Computing ◽

10.4018/978-1-4666-2854-0.ch003 ◽

2013 ◽

pp. 51-70

Author(s):

Adrian Jackson ◽

Michèle Weiland

Keyword(s):

Cloud Computing ◽

High Performance Computing ◽

High Performance ◽

Large Scale ◽

Parallel Programs ◽

Small Scale ◽

Cloud Infrastructure ◽

Scientific Simulations ◽

Cloud Infrastructures ◽

Performance Computing

This chapter describes experiences using Cloud infrastructures for scientific computing, both for serial and parallel computing. Amazon’s High Performance Computing (HPC) Cloud computing resources were compared to traditional HPC resources to quantify performance as well as assessing the complexity and cost of using the Cloud. Furthermore, a shared Cloud infrastructure is compared to standard desktop resources for scientific simulations. Whilst this is only a small scale evaluation these Cloud offerings, it does allow some conclusions to be drawn, particularly that the Cloud can currently not match the parallel performance of dedicated HPC machines for large scale parallel programs but can match the serial performance of standard computing resources for serial and small scale parallel programs. Also, the shared Cloud infrastructure cannot match dedicated computing resources for low level benchmarks, although for an actual scientific code, performance is comparable.

Download Full-text

High Performance and Fault Tolerance Double Precision Floating Point Arithmetic Units

Journal of Artificial Intelligence ◽

10.3923/jai.2013.154.160 ◽

2013 ◽

Vol 6 (2) ◽

pp. 154-160

Author(s):

N. Vinothkuma ◽

M.S. Ravi ◽

Kittur Harish Maillikarj

Keyword(s):

Fault Tolerance ◽

High Performance ◽

Floating Point ◽

Double Precision ◽

Floating Point Arithmetic ◽

Arithmetic Units ◽

Point Arithmetic

Download Full-text

Cosmic Worm in the CAVE: Steering a High-Performance Computing Application from a Virtual Environment

Presence Teleoperators & Virtual Environments ◽

10.1162/pres.1995.4.2.121 ◽

1995 ◽

Vol 4 (2) ◽

pp. 121-129 ◽

Cited By ~ 14

Author(s):

Trina M. Roy ◽

Carolina Cruz-Neira ◽

Thomas A. DeFanti

Keyword(s):

Virtual Reality ◽

Virtual Environment ◽

Distributed System ◽

High Performance ◽

Research Subject ◽

Virtual Reality Environment ◽

Scientific Application ◽

Design And Implementation ◽

Scientific Simulations ◽

Performance Computing

Developing graphic interfaces to steer high-performance scientific computations has been a research subject in recent years. Now, computational scientists are starting to use virtual reality environments to explore the results of their simulations. In most cases, the virtual reality environment acts on precomputed data; however, the use of virtual reality environments for the dynamic steering of distributed scientific simulations is a growing area of research. We present in this paper the initial design and implementation of a distributed system that uses our virtual reality environment, the CAVE, to control and steer scientific simulations being computed on remote supercomputers. We discuss some of the more relevant features of virtual reality interfaces, emphasizing those of the CAVE, describe the distributed system developed, and present a scientific application, the Cosmic Worm, that makes extensive use of the distributed system.

Download Full-text

Proceedings of the 2020 4th High Performance Computing and Cluster Technologies Conference & 2020 3rd International Conference on Big Data and Artificial Intelligence

10.1145/3409501 ◽

2020 ◽

Keyword(s):

Artificial Intelligence ◽

Big Data ◽

High Performance Computing ◽

High Performance ◽

International Conference ◽

Performance Computing

Download Full-text