Accurate Energy and Performance Prediction for Frequency-Scaled GPU Kernels

Kaijie Fan; Biagio Cosenza; Ben Juurlink

doi:10.3390/computation8020037

Accurate Energy and Performance Prediction for Frequency-Scaled GPU Kernels

Computation ◽

10.3390/computation8020037 ◽

2020 ◽

Vol 8 (2) ◽

pp. 37

Author(s):

Kaijie Fan ◽

Biagio Cosenza ◽

Ben Juurlink

Keyword(s):

Energy Consumption ◽

Performance Prediction ◽

High Performance ◽

Pareto Set ◽

Large Set ◽

Balance Performance ◽

Multi Objective ◽

Dynamic Voltage ◽

And Performance ◽

Performance Computing

Energy optimization is an increasingly important aspect of today’s high-performance computing applications. In particular, dynamic voltage and frequency scaling (DVFS) has become a widely adopted solution to balance performance and energy consumption, and hardware vendors provide management libraries that allow the programmer to change both memory and core frequencies manually to minimize energy consumption while maximizing performance. This article focuses on modeling the energy consumption and speedup of GPU applications while using different frequency configurations. The task is not straightforward, because of the large set of possible and uniformly distributed configurations and because of the multi-objective nature of the problem, which minimizes energy consumption and maximizes performance. This article proposes a machine learning-based method to predict the best core and memory frequency configurations on GPUs for an input OpenCL kernel. The method is based on two models for speedup and normalized energy predictions over the default frequency configuration. Those are later combined into a multi-objective approach that predicts a Pareto-set of frequency configurations. Results show that our approach is very accurate at predicting extema and the Pareto set, and finds frequency configurations that dominate the default configuration in either energy or performance.

Download Full-text

Balancing Power And Performance In HPC Clouds

The Computer Journal ◽

10.1093/comjnl/bxz150 ◽

2020 ◽

Vol 63 (6) ◽

pp. 880-899

Author(s):

Lixia Chen ◽

Jian Li ◽

Ruhui Ma ◽

Haibing Guan ◽

Hans-Arno Jacobsen

Keyword(s):

Energy Consumption ◽

High Performance ◽

Virtual Machines ◽

Support Vector ◽

Frequency Scaling ◽

Vm Placement ◽

Placement Problem ◽

Dynamic Voltage ◽

And Performance ◽

Save Energy

Abstract With energy consumption in high-performance computing clouds growing rapidly, energy saving has become an important topic. Virtualization provides opportunities to save energy by enabling one physical machine (PM) to host multiple virtual machines (VMs). Dynamic voltage and frequency scaling (DVFS) is another technology to reduce energy consumption. However, in heterogeneous cloud environments where DVFS may be applied at the chip level or the core level, it is a great challenge to combine these two technologies efficiently. On per-core DVFS servers, cloud managers should carefully determine VM placements to minimize performance interference. On full-chip DVFS servers, cloud managers further face the choice of whether to combine VMs with different characteristics to reduce performance interference or to combine VMs with similar characteristics to take better advantage of DVFS. This paper presents a novel mechanism combining a VM placement algorithm and a frequency scaling method. We formulate this VM placement problem as an integer programming (IP) to find appropriate placement configurations, and we utilize support vector machines to select suitable frequencies. We conduct detailed experiments and simulations, showing that our scheme effectively reduces energy consumption with modest impact on performance. Particularly, the total energy delay product is reduced by up to 60%.

Download Full-text

Intra- and Inter-Server Smart Task Scheduling for Profit and Energy Optimization of HPC Data Centers

Journal of Low Power Electronics and Applications ◽

10.3390/jlpea10040032 ◽

2020 ◽

Vol 10 (4) ◽

pp. 32

Author(s):

Sayed Ashraf Mamun ◽

Alexander Gilday ◽

Amit Kumar Singh ◽

Amlan Ganguly ◽

Geoff V. Merrett ◽

...

Keyword(s):

Energy Consumption ◽

Power Consumption ◽

Data Center ◽

High Performance ◽

Data Centers ◽

Virtual Machines ◽

For Profit ◽

Dynamic Voltage ◽

High Power Consumption ◽

Performance Computing

Servers in a data center are underutilized due to over-provisioning, which contributes heavily toward the high-power consumption of the data centers. Recent research in optimizing the energy consumption of High Performance Computing (HPC) data centers mostly focuses on consolidation of Virtual Machines (VMs) and using dynamic voltage and frequency scaling (DVFS). These approaches are inherently hardware-based, are frequently unique to individual systems, and often use simulation due to lack of access to HPC data centers. Other approaches require profiling information on the jobs in the HPC system to be available before run-time. In this paper, we propose a reinforcement learning based approach, which jointly optimizes profit and energy in the allocation of jobs to available resources, without the need for such prior information. The approach is implemented in a software scheduler used to allocate real applications from the Princeton Application Repository for Shared-Memory Computers (PARSEC) benchmark suite to a number of hardware nodes realized with Odroid-XU3 boards. Experiments show that the proposed approach increases the profit earned by 40% while simultaneously reducing energy consumption by 20% when compared to a heuristic-based approach. We also present a network-aware server consolidation algorithm called Bandwidth-Constrained Consolidation (BCC), for HPC data centers which can address the under-utilization problem of the servers. Our experiments show that the BCC consolidation technique can reduce the power consumption of a data center by up-to 37%.

Download Full-text

E-BaTS: Energy-Aware Scheduling for Bag-of-Task Applications in HPC Clusters

Parallel Processing Letters ◽

10.1142/s0129626415410054 ◽

2015 ◽

Vol 25 (03) ◽

pp. 1541005

Author(s):

Alexandra Vintila Filip ◽

Ana-Maria Oprescu ◽

Stefania Costache ◽

Thilo Kielmann

Keyword(s):

Energy Consumption ◽

High Performance Computing ◽

High Performance ◽

Terms Of Trade ◽

Exhaustive Search ◽

Energy Aware ◽

Trade Offs ◽

Energy Aware Scheduling ◽

And Performance ◽

Performance Computing

High-Performance Computing (HPC) systems consume large amounts of energy. As the energy consumption predictions for HPC show increasing numbers, it is important to make users aware of the energy spent for the execution of their applications. Drawing from our experience with exposing cost and performance in public clouds, in this paper we present a generic mechanism to compute fast and accurate estimates for the tradeoffs between the performance (expressed as makespan) and the energy consumption of applications running on HPC clusters. We validate our approach by implementing it in a prototype, called E-BaTS and validating it with a wide variety of HPC bags-of-tasks. Our experiments show that E-BaTS produces conservative estimates with errors below 5%, while requiring at most 12% of the energy and time of an exhaustive search for providing configurations close to the optimal ones in terms of trade-offs between energy consumption and makespan.

Download Full-text

Energy conscious scheduling with controlled threshold for precedence-constrained tasks on heterogeneous clusters

Concurrent Engineering ◽

10.1177/1063293x16679001 ◽

2016 ◽

Vol 25 (3) ◽

pp. 276-286 ◽

Cited By ~ 5

Author(s):

Nirmal Kaur ◽

Savina Bansal ◽

Rakesh Kumar Bansal

Keyword(s):

Energy Consumption ◽

High Performance Computing ◽

High Performance ◽

Scheduling Algorithm ◽

Voltage Scaling ◽

Dynamic Voltage Scaling ◽

Heterogeneous Cluster ◽

Dynamic Voltage ◽

Computing Platforms ◽

Performance Computing

Efficient task scheduling of concurrent tasks is one of the primary requirements for high-performance computing platforms. Recent advances in high-performance computing have resulted in widespread performance improvement though at the cost of increased energy consumption and other system resources. In this article, an energy conscious scheduling algorithm with controlled threshold has been developed for precedence-constrained tasks on heterogeneous cluster, which aims at lower makespan along with reduced energy consumption. Energy conscious scheduling with controlled threshold algorithm combines the benefits of dynamic voltage scaling with controlled threshold-based duplication strategy to achieve its objectives. Effectiveness of the proposed algorithm is analyzed in comparison with available duplication- and non-duplication-based scheduling algorithms (with and without dynamic voltage scaling approach) to ascertain its performance and energy consumption. Exhaustive simulation results on random and real-world graphs demonstrate that energy conscious scheduling algorithm with controlled threshold has the potential to reduce energy consumption and makespan.

Download Full-text

Low-Process–Voltage–Temperature-Sensitivity Multi-Stage Timing Monitor for System-on-Chip Applications

Electronics ◽

10.3390/electronics10131587 ◽

2021 ◽

Vol 10 (13) ◽

pp. 1587

Author(s):

Duo Sheng ◽

Hsueh-Ru Lin ◽

Li Tai

Keyword(s):

High Performance ◽

Power Reduction ◽

System On Chip ◽

Timing Information ◽

Multi Stage ◽

Dynamic Voltage ◽

And Performance ◽

On Chip ◽

Maximum Measurement ◽

Maximum Measurement Error

High performance and complex system-on-chip (SoC) design require a throughput and stable timing monitor to reduce the impacts of uncertain timing and implement the dynamic voltage and frequency scaling (DVFS) scheme for overall power reduction. This paper presents a multi-stage timing monitor, combining three timing-monitoring stages to achieve a high timing-monitoring resolution and a wide timing-monitoring range simultaneously. Additionally, because the proposed timing monitor has high immunity to the process–voltage–temperature (PVT) variation, it provides a more stable time-monitoring results. The time-monitoring resolution and range of the proposed timing monitor are 47 ps and 2.2 µs, respectively, and the maximum measurement error is 0.06%. Therefore, the proposed multi-stage timing monitor provides not only the timing information of the specified signals to maintain the functionality and performance of the SoC, but also makes the operation of the DVFS scheme more efficient and accurate in SoC design.

Download Full-text

A Hybrid Resource Reservation Method for Workflows in Clouds

International Journal of Grid and High Performance Computing ◽

10.4018/jghpc.2012100101 ◽

2012 ◽

Vol 4 (4) ◽

pp. 1-21

Author(s):

Tyng-Yeu Liang ◽

Fu-Chun Lu ◽

Jun-Yao Chiu

Keyword(s):

Cloud Computing ◽

Energy Consumption ◽

High Performance ◽

Resource Reservation ◽

Scientific Workflows ◽

Service Oriented ◽

Time And Energy ◽

Gpu Architecture ◽

Performance Computing ◽

Oriented System

QoS and energy consumption are two important issues for Cloud computing. In this paper, the authors propose a hybrid resource reservation method to address these two issues for scientific workflows in the high-performance computing Clouds built on hybrid CPU/GPU architecture. As named, this method reserves proper CPU or GPU for executing different jobs in the same workflow based on the profile of execution time and energy consumption of each resource-to-program pair. They have implemented the proposed resource reservation method on a real service-oriented system. The experimental results show that the proposed resource reservation method can effectively maintain the QoS of workflows while simultaneously minimizing the energy consumption of executing the workflows.

Download Full-text

SeisNoise.jl: Ambient Seismic Noise Cross Correlation on the CPU and GPU in Julia

Seismological Research Letters ◽

10.1785/0220200192 ◽

2020 ◽

Vol 92 (1) ◽

pp. 517-527

Author(s):

Timothy Clements ◽

Marine A. Denolle

Keyword(s):

Seismic Noise ◽

High Performance ◽

Cross Correlation ◽

Graphic Processing Unit ◽

Ambient Seismic Noise ◽

Processing Unit ◽

Central Processing ◽

And Performance ◽

Noise Cross Correlation ◽

Performance Computing

Abstract We introduce SeisNoise.jl, a library for high-performance ambient seismic noise cross correlation, written entirely in the computing language Julia. Julia is a new language, with syntax and a learning curve similar to MATLAB (see Data and Resources), R, or Python and performance close to Fortran or C. SeisNoise.jl is compatible with high-performance computing resources, using both the central processing unit and the graphic processing unit. SeisNoise.jl is a modular toolbox, giving researchers common tools and data structures to design custom ambient seismic cross-correlation workflows in Julia.

Download Full-text

Evaluation of a Cold-Mixed High-Performance Polyurethane Mixture

Advances in Materials Science and Engineering ◽

10.1155/2019/1507971 ◽

2019 ◽

Vol 2019 ◽

pp. 1-12 ◽

Cited By ~ 1

Author(s):

Sun Min ◽

Yufeng Bi ◽

Mulian Zheng ◽

Sai Chen ◽

Jingjing Li

Keyword(s):

Energy Consumption ◽

High Performance ◽

Greenhouse Gas Emission ◽

Asphalt Mixture ◽

Temperature Stability ◽

Water Stability ◽

High Temperature Stability ◽

Forming Mechanism ◽

And Performance

The energy consumption and greenhouse gas emission of asphalt pavement have become a very serious global problem. The high-temperature stability and durability of polyurethane (PU) are very good. It is studied as an alternative binder for asphalt recently. However, the strength-forming mechanism and the mixture structure of the PU mixture are different from the asphalt mixture. This work explored the design and performance evaluation of the PU mixture. The PU content of mixtures was determined by the creep slope (K), tensile strength ratios (TSR), immersion Cantabro loss (ICL), and the volume of air voids (VV) to ensure better water stability. The high- and low-temperature stability, water stability, dynamic mechanical property, and sustainability of the PU mixture were evaluated and compared with those of the stone matrix asphalt mixture (SMA). The test results showed that the dynamic stability and bending strain of the PU mixture were about 7.5 and 2.3 times of SMA. The adhesion level of PU and the basalt aggregate was one level greater than the limestone, and basalt aggregates were proposed to use in the PU mixture to improve water stability. Although the initial TSR and ICL of PU mixture were lower, the long-term values were higher; the PUM had better long-term water damage resistance. The dynamic modulus and phase angles (φ) of the PU mixture were much higher. The energy consumption and CO2 emission of the PU mixture were lower than those of SMA. Therefore, the cold-mixed PU mixture is a sustainable material with excellent performance and can be used as a substitute for asphalt mixture.

Download Full-text

Adaptive estimation and prediction of power and performance in high performance computing

Computer Science - Research and Development ◽

10.1007/s00450-010-0125-1 ◽

2010 ◽

Vol 25 (3-4) ◽

pp. 177-186 ◽

Cited By ~ 5

Author(s):

Reza Zamani ◽

Ahmad Afsahi

Keyword(s):

High Performance Computing ◽

High Performance ◽

Adaptive Estimation ◽

Estimation And Prediction ◽

And Performance ◽

Performance Computing

Download Full-text

Design and performance measurement of a high-performance computing cluster

2012 IEEE International Instrumentation and Measurement Technology Conference Proceedings ◽

10.1109/i2mtc.2012.6229359 ◽

2012 ◽

Cited By ~ 2

Author(s):

Kiran George ◽

Vivek Venugopal

Keyword(s):

Performance Measurement ◽

High Performance Computing ◽

High Performance ◽

And Performance ◽

High Performance Computing Cluster ◽

Performance Computing ◽

Computing Cluster

Download Full-text