Reduced-Precision Floating-Point Formats on GPUs for High Performance and Energy Efficient Computation

Author(s):  
Daichi Mukunoki ◽  
Toshiyuki Imamura

in our manuscript, various circuits for arithmetic summation are compared. Cadence 90nm technology and Quartus II EP2C20F484C7 are used for implementation of design. Logic gate-based adders, PFCA, TG and HSD technique-based adders characteristics are analyzed. Y finding is PFCA with 10T transistor performs slightly efficient compare to its counterpart. Exclusive OR-NOR design is optimum for least delay Adders for high performance energy efficient processing unit.


2017 ◽  
Vol 10 (3) ◽  
pp. 597-602
Author(s):  
Jyotindra Tiwari ◽  
Dr. Mahesh Pawar ◽  
Dr. Anjajana Pandey

Big Data is defined by 3Vs which stands for variety, volume and velocity. The volume of data is very huge, data exists in variety of file types and data grows very rapidly. Big data storage and processing has always been a big issue. Big data has become even more challenging to handle these days. To handle big data high performance techniques have been introduced. Several frameworks like Apache Hadoop has been introduced to process big data. Apache Hadoop provides map/reduce to process big data. But this map/reduce can be further accelerated. In this paper a survey has been performed for map/reduce acceleration and energy efficient computation in quick time.


Energies ◽  
2021 ◽  
Vol 14 (2) ◽  
pp. 376
Author(s):  
Matej Špeťko ◽  
Ondřej Vysocký ◽  
Branislav Jansík ◽  
Lubomír Říha

Nvidia is a leading producer of GPUs for high-performance computing and artificial intelligence, bringing top performance and energy-efficiency. We present performance, power consumption, and thermal behavior analysis of the new Nvidia DGX-A100 server equipped with eight A100 Ampere microarchitecture GPUs. The results are compared against the previous generation of the server, Nvidia DGX-2, based on Tesla V100 GPUs. We developed a synthetic benchmark to measure the raw performance of floating-point computing units including Tensor Cores. Furthermore, thermal stability was investigated. In addition, Dynamic Frequency and Voltage Scaling (DVFS) analysis was performed to determine the best energy-efficient configuration of the GPUs executing workloads of various arithmetical intensities. Under the energy-optimal configuration the A100 GPU reaches efficiency of 51 GFLOPS/W for double-precision workload and 91 GFLOPS/W for tensor core double precision workload, which makes the A100 the most energy-efficient server accelerator for scientific simulations in the market.


2015 ◽  
Vol 1 (4) ◽  
pp. 1-12
Author(s):  
Chidadala Janardhan ◽  
◽  
Bhagath Pyda ◽  
J. Manohar ◽  
K. V. Ramanaiah ◽  
...  

Author(s):  
Jack Dongarra ◽  
Laura Grigori ◽  
Nicholas J. Higham

A number of features of today’s high-performance computers make it challenging to exploit these machines fully for computational science. These include increasing core counts but stagnant clock frequencies; the high cost of data movement; use of accelerators (GPUs, FPGAs, coprocessors), making architectures increasingly heterogeneous; and multi- ple precisions of floating-point arithmetic, including half-precision. Moreover, as well as maximizing speed and accuracy, minimizing energy consumption is an important criterion. New generations of algorithms are needed to tackle these challenges. We discuss some approaches that we can take to develop numerical algorithms for high-performance computational science, with a view to exploiting the next generation of supercomputers. This article is part of a discussion meeting issue ‘Numerical algorithms for high-performance computational science’.


2019 ◽  
Vol 15 (4) ◽  
pp. 1-21
Author(s):  
Bing Li ◽  
Mengjie Mao ◽  
Xiaoxiao Liu ◽  
Tao Liu ◽  
Zihao Liu ◽  
...  

Nano Energy ◽  
2021 ◽  
Vol 82 ◽  
pp. 105717
Author(s):  
Min-Ci Wu ◽  
Jui-Yuan Chen ◽  
Yi-Hsin Ting ◽  
Chih-Yang Huang ◽  
Wen-Wei Wu

Sign in / Sign up

Export Citation Format

Share Document