scholarly journals DGX-A100 Face to Face DGX-2—Performance, Power and Thermal Behavior Evaluation

Energies ◽  
2021 ◽  
Vol 14 (2) ◽  
pp. 376
Author(s):  
Matej Špeťko ◽  
Ondřej Vysocký ◽  
Branislav Jansík ◽  
Lubomír Říha

Nvidia is a leading producer of GPUs for high-performance computing and artificial intelligence, bringing top performance and energy-efficiency. We present performance, power consumption, and thermal behavior analysis of the new Nvidia DGX-A100 server equipped with eight A100 Ampere microarchitecture GPUs. The results are compared against the previous generation of the server, Nvidia DGX-2, based on Tesla V100 GPUs. We developed a synthetic benchmark to measure the raw performance of floating-point computing units including Tensor Cores. Furthermore, thermal stability was investigated. In addition, Dynamic Frequency and Voltage Scaling (DVFS) analysis was performed to determine the best energy-efficient configuration of the GPUs executing workloads of various arithmetical intensities. Under the energy-optimal configuration the A100 GPU reaches efficiency of 51 GFLOPS/W for double-precision workload and 91 GFLOPS/W for tensor core double precision workload, which makes the A100 the most energy-efficient server accelerator for scientific simulations in the market.

Author(s):  
Azzam Haidar ◽  
Harun Bayraktar ◽  
Stanimire Tomov ◽  
Jack Dongarra ◽  
Nicholas J. Higham

Double-precision floating-point arithmetic (FP64) has been the de facto standard for engineering and scientific simulations for several decades. Problem complexity and the sheer volume of data coming from various instruments and sensors motivate researchers to mix and match various approaches to optimize compute resources, including different levels of floating-point precision. In recent years, machine learning has motivated hardware support for half-precision floating-point arithmetic. A primary challenge in high-performance computing is to leverage reduced-precision and mixed-precision hardware. We show how the FP16/FP32 Tensor Cores on NVIDIA GPUs can be exploited to accelerate the solution of linear systems of equations Ax  =  b without sacrificing numerical stability. The techniques we employ include multiprecision LU factorization, the preconditioned generalized minimal residual algorithm (GMRES), and scaling and auto-adaptive rounding to avoid overflow. We also show how to efficiently handle systems with multiple right-hand sides. On the NVIDIA Quadro GV100 (Volta) GPU, we achieve a 4 × − 5 × performance increase and 5× better energy efficiency versus the standard FP64 implementation while maintaining an FP64 level of numerical stability.


Author(s):  
Adrian Jackson ◽  
Michèle Weiland

This chapter describes experiences using Cloud infrastructures for scientific computing, both for serial and parallel computing. Amazon’s High Performance Computing (HPC) Cloud computing resources were compared to traditional HPC resources to quantify performance as well as assessing the complexity and cost of using the Cloud. Furthermore, a shared Cloud infrastructure is compared to standard desktop resources for scientific simulations. Whilst this is only a small scale evaluation these Cloud offerings, it does allow some conclusions to be drawn, particularly that the Cloud can currently not match the parallel performance of dedicated HPC machines for large scale parallel programs but can match the serial performance of standard computing resources for serial and small scale parallel programs. Also, the shared Cloud infrastructure cannot match dedicated computing resources for low level benchmarks, although for an actual scientific code, performance is comparable.


1995 ◽  
Vol 4 (2) ◽  
pp. 121-129 ◽  
Author(s):  
Trina M. Roy ◽  
Carolina Cruz-Neira ◽  
Thomas A. DeFanti

Developing graphic interfaces to steer high-performance scientific computations has been a research subject in recent years. Now, computational scientists are starting to use virtual reality environments to explore the results of their simulations. In most cases, the virtual reality environment acts on precomputed data; however, the use of virtual reality environments for the dynamic steering of distributed scientific simulations is a growing area of research. We present in this paper the initial design and implementation of a distributed system that uses our virtual reality environment, the CAVE, to control and steer scientific simulations being computed on remote supercomputers. We discuss some of the more relevant features of virtual reality interfaces, emphasizing those of the CAVE, describe the distributed system developed, and present a scientific application, the Cosmic Worm, that makes extensive use of the distributed system.


Sign in / Sign up

Export Citation Format

Share Document