Hierarchical MATE's approach for dynamic performance tuning of large-scale parallel applications

Statistical and machine learning models for optimizing energy in parallel applications

The International Journal of High Performance Computing Applications ◽

10.1177/1094342019842915 ◽

2019 ◽

Vol 33 (6) ◽

pp. 1079-1097 ◽

Cited By ~ 2

Author(s):

Mark Endrei ◽

Chao Jin ◽

Minh Ngoc Dinh ◽

David Abramson ◽

Heidi Poxon ◽

...

Keyword(s):

Machine Learning ◽

Energy Efficiency ◽

High Performance ◽

Large Scale ◽

Energy Use ◽

Parallel Applications ◽

Learning Models ◽

Trade Off ◽

Time Required ◽

Machine Learning Models

Rising power costs and constraints are driving a growing focus on the energy efficiency of high performance computing systems. The unique characteristics of a particular system and workload and their effect on performance and energy efficiency are typically difficult for application users to assess and to control. Settings for optimum performance and energy efficiency can also diverge, so we need to identify trade-off options that guide a suitable balance between energy use and performance. We present statistical and machine learning models that only require a small number of runs to make accurate Pareto-optimal trade-off predictions using parameters that users can control. We study model training and validation using several parallel kernels and more complex workloads, including Algebraic Multigrid (AMG), Large-scale Atomic Molecular Massively Parallel Simulator, and Livermore Unstructured Lagrangian Explicit Shock Hydrodynamics. We demonstrate that we can train the models using as few as 12 runs, with prediction error of less than 10%. Our AMG results identify trade-off options that provide up to 45% improvement in energy efficiency for around 10% performance loss. We reduce the sample measurement time required for AMG by 90%, from 13 h to 74 min.

Download Full-text

A Novel Methodology for Adaptive Coordination of Multiple Controllers in Electrical Grids

Mathematics ◽

10.3390/math9131474 ◽

2021 ◽

Vol 9 (13) ◽

pp. 1474

Author(s):

Ruben Tapia-Olvera ◽

Francisco Beltran-Carbajal ◽

Antonio Valderrabano-Gonzalez ◽

Omar Aguilar-Mejia

Keyword(s):

Power Systems ◽

Power System ◽

Electric Power ◽

Large Scale ◽

Short Circuit ◽

Dynamic Performance ◽

Low Frequency ◽

Electric Power System ◽

Design Stage ◽

Positive Interaction

This proposal is aimed to overcome the problem that arises when diverse regulation devices and controlling strategies are involved in electric power systems regulation design. When new devices are included in electric power system after the topology and regulation goals were defined, a new design stage is generally needed to obtain the desired outputs. Moreover, if the initial design is based on a linearized model around an equilibrium point, the new conditions might degrade the whole performance of the system. Our proposal demonstrates that the power system performance can be guaranteed with one design stage when an adequate adaptive scheme is updating some critic controllers’ gains. For large-scale power systems, this feature is illustrated with the use of time domain simulations, showing the dynamic behavior of the significant variables. The transient response is enhanced in terms of maximum overshoot and settling time. This is demonstrated using the deviation between the behavior of some important variables with StatCom, but without or with PSS. A B-Spline neural networks algorithm is used to define the best controllers’ gains to efficiently attenuate low frequency oscillations when a short circuit event is presented. This strategy avoids the parameters and power system model dependency; only a dataset of typical variable measurements is required to achieve the expected behavior. The inclusion of PSS and StatCom with positive interaction, enhances the dynamic performance of the system while illustrating the ability of the strategy in adding different controllers in only one design stage.

Download Full-text

Dynamic performance investigation for large-scale wind turbine tower

2005 International Conference on Electrical Machines and Systems ◽

10.1109/icems.2005.202694 ◽

2005 ◽

Author(s):

Fei Cha ◽

Wang Nan ◽

Zhou Bo ◽

Chen Changzheng

Keyword(s):

Wind Turbine ◽

Large Scale ◽

Dynamic Performance ◽

Wind Turbine Tower

Download Full-text

Performance Prediction for Large-Scale Parallel Applications Using Representative Replay

IEEE Transactions on Computers ◽

10.1109/tc.2015.2479630 ◽

2016 ◽

Vol 65 (7) ◽

pp. 2184-2198 ◽

Cited By ~ 9

Author(s):

Jidong Zhai ◽

Wenguang Chen ◽

Weimin Zheng ◽

Keqin Li

Keyword(s):

Performance Prediction ◽

Large Scale ◽

Parallel Applications

Download Full-text

Periodic hierarchical load balancing for large supercomputers

The International Journal of High Performance Computing Applications ◽

10.1177/1094342010394383 ◽

2011 ◽

Vol 25 (4) ◽

pp. 371-385 ◽

Cited By ~ 34

Author(s):

Gengbin Zheng ◽

Abhinav Bhatelé ◽

Esteban Meneses ◽

Laxmikant V. Kalé

Keyword(s):

Load Balancing ◽

Large Scale ◽

Parallel Machines ◽

National Laboratory ◽

Argonne National Laboratory ◽

Parallel Applications ◽

Scientific Application ◽

Computing Center ◽

Blue Gene ◽

Advanced Computing

Large parallel machines with hundreds of thousands of processors are becoming more prevalent. Ensuring good load balance is critical for scaling certain classes of parallel applications on even thousands of processors. Centralized load balancing algorithms suffer from scalability problems, especially on machines with a relatively small amount of memory. Fully distributed load balancing algorithms, on the other hand, tend to take longer to arrive at good solutions. In this paper, we present an automatic dynamic hierarchical load balancing method that overcomes the scalability challenges of centralized schemes and longer running times of traditional distributed schemes. Our solution overcomes these issues by creating multiple levels of load balancing domains which form a tree. This hierarchical method is demonstrated within a measurement-based load balancing framework in Charm++. We discuss techniques to deal with scalability challenges of load balancing at very large scale. We present performance data of the hierarchical load balancing method on up to 16,384 cores of Ranger (at the Texas Advanced Computing Center) and 65,536 cores of Intrepid (the Blue Gene/P at Argonne National Laboratory) for a synthetic benchmark. We also demonstrate the successful deployment of the method in a scientific application, NAMD, with results on Intrepid.

Download Full-text

Dynamic performance testing and implementation for static var compensator controller via hardware-in-the-loop simulation under large-scale power system with real-time simulators

Simulation Modelling Practice and Theory ◽

10.1016/j.simpat.2020.102191 ◽

2021 ◽

Vol 106 ◽

pp. 102191

Author(s):

Jiyoung Song ◽

Solyoung Jung ◽

Jaegul Lee ◽

Jeonghoon Shin ◽

Gilsoo Jang

Keyword(s):

Power System ◽

Real Time ◽

Large Scale ◽

Dynamic Performance ◽

Performance Testing ◽

Hardware In The Loop ◽

Static Var Compensator

Download Full-text

Application of the First Replica Controller in Korean Power Systems

Energies ◽

10.3390/en13133343 ◽

2020 ◽

Vol 13 (13) ◽

pp. 3343 ◽

Cited By ~ 1

Author(s):

Jiyoung Song ◽

Seungchan Oh ◽

Jaegul Lee ◽

Jeonghoon Shin ◽

Gilsoo Jang

Keyword(s):

Power Systems ◽

Power System ◽

Large Scale ◽

Performance Test ◽

Dynamic Performance ◽

Parameter Tuning ◽

Electric Utility ◽

Acceptance Test ◽

Thyristor Controlled Series Compensator ◽

Practical Field

The purpose of this paper is to introduce, examine, and evaluate the industrial experiences and effectiveness of a Thyristor Controlled Series Compensator (TCSC) replica controller installed in Korea in 2019 through a review of its configuration, test platform, and practical application, and further to propose operational guidelines for replica controllers. Four representative practical cases were conducted: a Dynamic Performance Test (DPT) under a sufficiently large-scale power system prior to the Site Acceptance Test (SAT), pre-verification for on-site controller modification during operation stage, parameter tuning to mitigate the control interaction, and time domain simulation for Sub-Synchronous Torsional Interaction (SSTI). None of these four cases can be performed in a Factory Acceptance Test (FAT) or on-site. Therefore, TCSC control performance was accurately verified under the entire Korean power system based on a large-scale real-time simulator, which demonstrated its effectiveness as a powerful tool for operations including multiple power electronics devices. Our review herein of these four practical cases is expected to show the usefulness of replica controllers, to demonstrate their strength to deal with practical field events, and to contribute to the further expansion of the application area from a perspective of electric utility.

Download Full-text

Performance-Aware Scheduling of Parallel Applications on Non-Dedicated Clusters

Electronics ◽

10.3390/electronics8090982 ◽

2019 ◽

Vol 8 (9) ◽

pp. 982 ◽

Cited By ~ 1

Author(s):

Alberto Cascajo ◽

David E. Singh ◽

Jesus Carretero

Keyword(s):

Large Scale ◽

Job Scheduling ◽

Parallel Applications ◽

Data Staging ◽

Performance Improvements ◽

Practical Evaluation ◽

Significant Performance ◽

Scalable Monitoring ◽

And Control ◽

New Strategies

This work presents a HPC framework that provides new strategies for resource management and job scheduling, based on executing different applications in shared compute nodes, maximizing platform utilization. The framework includes a scalable monitoring tool that is able to analyze the platform’s compute node utilization. We also introduce an extension of CLARISSE, a middleware for data-staging coordination and control on large-scale HPC platforms that uses the information provided by the monitor in combination with application-level analysis to detect performance degradation in the running applications. This degradation, caused by the fact that the applications share the compute nodes and may compete for their resources, is avoided by means of dynamic application migration. A description of the architecture, as well as a practical evaluation of the proposal, shows significant performance improvements up to 20% in the makespan and 10% in energy consumption compared to a non-optimized execution.

Download Full-text

A mechanism for balancing accuracy and scope in cross-machine black-box GPU performance modeling

The International Journal of High Performance Computing Applications ◽

10.1177/1094342020921340 ◽

2020 ◽

Vol 34 (6) ◽

pp. 589-614

Author(s):

James D Stevens ◽

Andreas Klöckner

Keyword(s):

Performance Optimization ◽

Heterogeneous Computing ◽

Performance Modeling ◽

Matrix Multiplication ◽

Black Box ◽

Ease Of Use ◽

Performance Tuning ◽

Parallel Applications ◽

Accuracy Evaluation ◽

Trade Offs

The ability to model, analyze, and predict execution time of computations is an important building block that supports numerous efforts, such as load balancing, benchmarking, job scheduling, developer-guided performance optimization, and the automation of performance tuning for high performance, parallel applications. In today’s increasingly heterogeneous computing environment, this task must be accomplished efficiently across multiple architectures, including massively parallel coprocessors like GPUs, which are increasingly prevalent in the world’s fastest supercomputers. To address this challenge, we present an approach for constructing customizable, cross-machine performance models for GPU kernels, including a mechanism to automatically and symbolically gather performance-relevant kernel operation counts, a tool for formulating mathematical models using these counts, and a customizable parameterized collection of benchmark kernels used to calibrate models to GPUs in a black-box fashion. With this approach, we empower the user to manage trade-offs between model accuracy, evaluation speed, and generalizability. A user can define their own model and customize the calibration process, making it as simple or complex as desired, and as application-targeted or general as desired. As application examples of our approach, we demonstrate both linear and nonlinear models; these examples are designed to predict execution times for multiple variants of a particular computation: two matrix-matrix multiplication variants, four discontinuous Galerkin differentiation operation variants, and two 2D five-point finite difference stencil variants. For each variant, we present accuracy results on GPUs from multiple vendors and hardware generations. We view this highly user-customizable approach as a response to a central question arising in GPU performance modeling: how can we model GPU performance in a cost-explanatory fashion while maintaining accuracy, evaluation speed, portability, and ease of use, an attribute we believe precludes approaches requiring manual collection of kernel or hardware statistics.

Download Full-text

Dynamic Performance and Structural Optimization of Large-Scale Machine Table

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.615.313 ◽

2014 ◽

Vol 615 ◽

pp. 313-316

Author(s):

Zai Liang Chen ◽

Luo Hong Deng ◽

Cong Jing

Keyword(s):

Structural Optimization ◽

Natural Frequency ◽

Dynamic Optimization ◽

Large Scale ◽

Lattice Structure ◽

Dynamic Performance ◽

Structure Parameters ◽

Milling Machine ◽

Plate Dynamic ◽

The Ideal

Designed new table for large floor boring and milling machine, used ANSYS to optimize the structure of the table as a whole. According to the contours of removable material the materials which can be removed, obtained the inner ribs layout of table and the sand holes location of rib plate. Dynamic optimization variables on basic ribs cell, studied the effect of steel lattice structure parameters influenced on the natural frequency of the lattices and the related parameter of lattices influenced on whole table, to get the ideal rib lattice structure after optimizing again. Optimized bench can reduce quality, increase rigidity and dynamic performance.

Download Full-text