scholarly journals Accurate Energy and Performance Prediction for Frequency-Scaled GPU Kernels

Computation ◽  
2020 ◽  
Vol 8 (2) ◽  
pp. 37
Author(s):  
Kaijie Fan ◽  
Biagio Cosenza ◽  
Ben Juurlink

Energy optimization is an increasingly important aspect of today’s high-performance computing applications. In particular, dynamic voltage and frequency scaling (DVFS) has become a widely adopted solution to balance performance and energy consumption, and hardware vendors provide management libraries that allow the programmer to change both memory and core frequencies manually to minimize energy consumption while maximizing performance. This article focuses on modeling the energy consumption and speedup of GPU applications while using different frequency configurations. The task is not straightforward, because of the large set of possible and uniformly distributed configurations and because of the multi-objective nature of the problem, which minimizes energy consumption and maximizes performance. This article proposes a machine learning-based method to predict the best core and memory frequency configurations on GPUs for an input OpenCL kernel. The method is based on two models for speedup and normalized energy predictions over the default frequency configuration. Those are later combined into a multi-objective approach that predicts a Pareto-set of frequency configurations. Results show that our approach is very accurate at predicting extema and the Pareto set, and finds frequency configurations that dominate the default configuration in either energy or performance.

2020 ◽  
Vol 63 (6) ◽  
pp. 880-899
Author(s):  
Lixia Chen ◽  
Jian Li ◽  
Ruhui Ma ◽  
Haibing Guan ◽  
Hans-Arno Jacobsen

Abstract With energy consumption in high-performance computing clouds growing rapidly, energy saving has become an important topic. Virtualization provides opportunities to save energy by enabling one physical machine (PM) to host multiple virtual machines (VMs). Dynamic voltage and frequency scaling (DVFS) is another technology to reduce energy consumption. However, in heterogeneous cloud environments where DVFS may be applied at the chip level or the core level, it is a great challenge to combine these two technologies efficiently. On per-core DVFS servers, cloud managers should carefully determine VM placements to minimize performance interference. On full-chip DVFS servers, cloud managers further face the choice of whether to combine VMs with different characteristics to reduce performance interference or to combine VMs with similar characteristics to take better advantage of DVFS. This paper presents a novel mechanism combining a VM placement algorithm and a frequency scaling method. We formulate this VM placement problem as an integer programming (IP) to find appropriate placement configurations, and we utilize support vector machines to select suitable frequencies. We conduct detailed experiments and simulations, showing that our scheme effectively reduces energy consumption with modest impact on performance. Particularly, the total energy delay product is reduced by up to 60%.


2020 ◽  
Vol 10 (4) ◽  
pp. 32
Author(s):  
Sayed Ashraf Mamun ◽  
Alexander Gilday ◽  
Amit Kumar Singh ◽  
Amlan Ganguly ◽  
Geoff V. Merrett ◽  
...  

Servers in a data center are underutilized due to over-provisioning, which contributes heavily toward the high-power consumption of the data centers. Recent research in optimizing the energy consumption of High Performance Computing (HPC) data centers mostly focuses on consolidation of Virtual Machines (VMs) and using dynamic voltage and frequency scaling (DVFS). These approaches are inherently hardware-based, are frequently unique to individual systems, and often use simulation due to lack of access to HPC data centers. Other approaches require profiling information on the jobs in the HPC system to be available before run-time. In this paper, we propose a reinforcement learning based approach, which jointly optimizes profit and energy in the allocation of jobs to available resources, without the need for such prior information. The approach is implemented in a software scheduler used to allocate real applications from the Princeton Application Repository for Shared-Memory Computers (PARSEC) benchmark suite to a number of hardware nodes realized with Odroid-XU3 boards. Experiments show that the proposed approach increases the profit earned by 40% while simultaneously reducing energy consumption by 20% when compared to a heuristic-based approach. We also present a network-aware server consolidation algorithm called Bandwidth-Constrained Consolidation (BCC), for HPC data centers which can address the under-utilization problem of the servers. Our experiments show that the BCC consolidation technique can reduce the power consumption of a data center by up-to 37%.


2015 ◽  
Vol 25 (03) ◽  
pp. 1541005
Author(s):  
Alexandra Vintila Filip ◽  
Ana-Maria Oprescu ◽  
Stefania Costache ◽  
Thilo Kielmann

High-Performance Computing (HPC) systems consume large amounts of energy. As the energy consumption predictions for HPC show increasing numbers, it is important to make users aware of the energy spent for the execution of their applications. Drawing from our experience with exposing cost and performance in public clouds, in this paper we present a generic mechanism to compute fast and accurate estimates for the tradeoffs between the performance (expressed as makespan) and the energy consumption of applications running on HPC clusters. We validate our approach by implementing it in a prototype, called E-BaTS and validating it with a wide variety of HPC bags-of-tasks. Our experiments show that E-BaTS produces conservative estimates with errors below 5%, while requiring at most 12% of the energy and time of an exhaustive search for providing configurations close to the optimal ones in terms of trade-offs between energy consumption and makespan.


2016 ◽  
Vol 25 (3) ◽  
pp. 276-286 ◽  
Author(s):  
Nirmal Kaur ◽  
Savina Bansal ◽  
Rakesh Kumar Bansal

Efficient task scheduling of concurrent tasks is one of the primary requirements for high-performance computing platforms. Recent advances in high-performance computing have resulted in widespread performance improvement though at the cost of increased energy consumption and other system resources. In this article, an energy conscious scheduling algorithm with controlled threshold has been developed for precedence-constrained tasks on heterogeneous cluster, which aims at lower makespan along with reduced energy consumption. Energy conscious scheduling with controlled threshold algorithm combines the benefits of dynamic voltage scaling with controlled threshold-based duplication strategy to achieve its objectives. Effectiveness of the proposed algorithm is analyzed in comparison with available duplication- and non-duplication-based scheduling algorithms (with and without dynamic voltage scaling approach) to ascertain its performance and energy consumption. Exhaustive simulation results on random and real-world graphs demonstrate that energy conscious scheduling algorithm with controlled threshold has the potential to reduce energy consumption and makespan.


Electronics ◽  
2021 ◽  
Vol 10 (13) ◽  
pp. 1587
Author(s):  
Duo Sheng ◽  
Hsueh-Ru Lin ◽  
Li Tai

High performance and complex system-on-chip (SoC) design require a throughput and stable timing monitor to reduce the impacts of uncertain timing and implement the dynamic voltage and frequency scaling (DVFS) scheme for overall power reduction. This paper presents a multi-stage timing monitor, combining three timing-monitoring stages to achieve a high timing-monitoring resolution and a wide timing-monitoring range simultaneously. Additionally, because the proposed timing monitor has high immunity to the process–voltage–temperature (PVT) variation, it provides a more stable time-monitoring results. The time-monitoring resolution and range of the proposed timing monitor are 47 ps and 2.2 µs, respectively, and the maximum measurement error is 0.06%. Therefore, the proposed multi-stage timing monitor provides not only the timing information of the specified signals to maintain the functionality and performance of the SoC, but also makes the operation of the DVFS scheme more efficient and accurate in SoC design.


Author(s):  
Tyng-Yeu Liang ◽  
Fu-Chun Lu ◽  
Jun-Yao Chiu

QoS and energy consumption are two important issues for Cloud computing. In this paper, the authors propose a hybrid resource reservation method to address these two issues for scientific workflows in the high-performance computing Clouds built on hybrid CPU/GPU architecture. As named, this method reserves proper CPU or GPU for executing different jobs in the same workflow based on the profile of execution time and energy consumption of each resource-to-program pair. They have implemented the proposed resource reservation method on a real service-oriented system. The experimental results show that the proposed resource reservation method can effectively maintain the QoS of workflows while simultaneously minimizing the energy consumption of executing the workflows.


2020 ◽  
Vol 92 (1) ◽  
pp. 517-527
Author(s):  
Timothy Clements ◽  
Marine A. Denolle

Abstract We introduce SeisNoise.jl, a library for high-performance ambient seismic noise cross correlation, written entirely in the computing language Julia. Julia is a new language, with syntax and a learning curve similar to MATLAB (see Data and Resources), R, or Python and performance close to Fortran or C. SeisNoise.jl is compatible with high-performance computing resources, using both the central processing unit and the graphic processing unit. SeisNoise.jl is a modular toolbox, giving researchers common tools and data structures to design custom ambient seismic cross-correlation workflows in Julia.


2019 ◽  
Vol 2019 ◽  
pp. 1-12 ◽  
Author(s):  
Sun Min ◽  
Yufeng Bi ◽  
Mulian Zheng ◽  
Sai Chen ◽  
Jingjing Li

The energy consumption and greenhouse gas emission of asphalt pavement have become a very serious global problem. The high-temperature stability and durability of polyurethane (PU) are very good. It is studied as an alternative binder for asphalt recently. However, the strength-forming mechanism and the mixture structure of the PU mixture are different from the asphalt mixture. This work explored the design and performance evaluation of the PU mixture. The PU content of mixtures was determined by the creep slope (K), tensile strength ratios (TSR), immersion Cantabro loss (ICL), and the volume of air voids (VV) to ensure better water stability. The high- and low-temperature stability, water stability, dynamic mechanical property, and sustainability of the PU mixture were evaluated and compared with those of the stone matrix asphalt mixture (SMA). The test results showed that the dynamic stability and bending strain of the PU mixture were about 7.5 and 2.3 times of SMA. The adhesion level of PU and the basalt aggregate was one level greater than the limestone, and basalt aggregates were proposed to use in the PU mixture to improve water stability. Although the initial TSR and ICL of PU mixture were lower, the long-term values were higher; the PUM had better long-term water damage resistance. The dynamic modulus and phase angles (φ) of the PU mixture were much higher. The energy consumption and CO2 emission of the PU mixture were lower than those of SMA. Therefore, the cold-mixed PU mixture is a sustainable material with excellent performance and can be used as a substitute for asphalt mixture.


Sign in / Sign up

Export Citation Format

Share Document