Hierarchical MATE's approach for dynamic performance tuning of large-scale parallel applications

Author(s):  
Andrea Martinez ◽  
Anna Sikora ◽  
Eduardo Cesar ◽  
Joan Sorribes
Author(s):  
Mark Endrei ◽  
Chao Jin ◽  
Minh Ngoc Dinh ◽  
David Abramson ◽  
Heidi Poxon ◽  
...  

Rising power costs and constraints are driving a growing focus on the energy efficiency of high performance computing systems. The unique characteristics of a particular system and workload and their effect on performance and energy efficiency are typically difficult for application users to assess and to control. Settings for optimum performance and energy efficiency can also diverge, so we need to identify trade-off options that guide a suitable balance between energy use and performance. We present statistical and machine learning models that only require a small number of runs to make accurate Pareto-optimal trade-off predictions using parameters that users can control. We study model training and validation using several parallel kernels and more complex workloads, including Algebraic Multigrid (AMG), Large-scale Atomic Molecular Massively Parallel Simulator, and Livermore Unstructured Lagrangian Explicit Shock Hydrodynamics. We demonstrate that we can train the models using as few as 12 runs, with prediction error of less than 10%. Our AMG results identify trade-off options that provide up to 45% improvement in energy efficiency for around 10% performance loss. We reduce the sample measurement time required for AMG by 90%, from 13 h to 74 min.


Mathematics ◽  
2021 ◽  
Vol 9 (13) ◽  
pp. 1474
Author(s):  
Ruben Tapia-Olvera ◽  
Francisco Beltran-Carbajal ◽  
Antonio Valderrabano-Gonzalez ◽  
Omar Aguilar-Mejia

This proposal is aimed to overcome the problem that arises when diverse regulation devices and controlling strategies are involved in electric power systems regulation design. When new devices are included in electric power system after the topology and regulation goals were defined, a new design stage is generally needed to obtain the desired outputs. Moreover, if the initial design is based on a linearized model around an equilibrium point, the new conditions might degrade the whole performance of the system. Our proposal demonstrates that the power system performance can be guaranteed with one design stage when an adequate adaptive scheme is updating some critic controllers’ gains. For large-scale power systems, this feature is illustrated with the use of time domain simulations, showing the dynamic behavior of the significant variables. The transient response is enhanced in terms of maximum overshoot and settling time. This is demonstrated using the deviation between the behavior of some important variables with StatCom, but without or with PSS. A B-Spline neural networks algorithm is used to define the best controllers’ gains to efficiently attenuate low frequency oscillations when a short circuit event is presented. This strategy avoids the parameters and power system model dependency; only a dataset of typical variable measurements is required to achieve the expected behavior. The inclusion of PSS and StatCom with positive interaction, enhances the dynamic performance of the system while illustrating the ability of the strategy in adding different controllers in only one design stage.


2016 ◽  
Vol 65 (7) ◽  
pp. 2184-2198 ◽  
Author(s):  
Jidong Zhai ◽  
Wenguang Chen ◽  
Weimin Zheng ◽  
Keqin Li

Author(s):  
Gengbin Zheng ◽  
Abhinav Bhatelé ◽  
Esteban Meneses ◽  
Laxmikant V. Kalé

Large parallel machines with hundreds of thousands of processors are becoming more prevalent. Ensuring good load balance is critical for scaling certain classes of parallel applications on even thousands of processors. Centralized load balancing algorithms suffer from scalability problems, especially on machines with a relatively small amount of memory. Fully distributed load balancing algorithms, on the other hand, tend to take longer to arrive at good solutions. In this paper, we present an automatic dynamic hierarchical load balancing method that overcomes the scalability challenges of centralized schemes and longer running times of traditional distributed schemes. Our solution overcomes these issues by creating multiple levels of load balancing domains which form a tree. This hierarchical method is demonstrated within a measurement-based load balancing framework in Charm++. We discuss techniques to deal with scalability challenges of load balancing at very large scale. We present performance data of the hierarchical load balancing method on up to 16,384 cores of Ranger (at the Texas Advanced Computing Center) and 65,536 cores of Intrepid (the Blue Gene/P at Argonne National Laboratory) for a synthetic benchmark. We also demonstrate the successful deployment of the method in a scientific application, NAMD, with results on Intrepid.


Energies ◽  
2020 ◽  
Vol 13 (13) ◽  
pp. 3343 ◽  
Author(s):  
Jiyoung Song ◽  
Seungchan Oh ◽  
Jaegul Lee ◽  
Jeonghoon Shin ◽  
Gilsoo Jang

The purpose of this paper is to introduce, examine, and evaluate the industrial experiences and effectiveness of a Thyristor Controlled Series Compensator (TCSC) replica controller installed in Korea in 2019 through a review of its configuration, test platform, and practical application, and further to propose operational guidelines for replica controllers. Four representative practical cases were conducted: a Dynamic Performance Test (DPT) under a sufficiently large-scale power system prior to the Site Acceptance Test (SAT), pre-verification for on-site controller modification during operation stage, parameter tuning to mitigate the control interaction, and time domain simulation for Sub-Synchronous Torsional Interaction (SSTI). None of these four cases can be performed in a Factory Acceptance Test (FAT) or on-site. Therefore, TCSC control performance was accurately verified under the entire Korean power system based on a large-scale real-time simulator, which demonstrated its effectiveness as a powerful tool for operations including multiple power electronics devices. Our review herein of these four practical cases is expected to show the usefulness of replica controllers, to demonstrate their strength to deal with practical field events, and to contribute to the further expansion of the application area from a perspective of electric utility.


Electronics ◽  
2019 ◽  
Vol 8 (9) ◽  
pp. 982 ◽  
Author(s):  
Alberto Cascajo ◽  
David E. Singh ◽  
Jesus Carretero

This work presents a HPC framework that provides new strategies for resource management and job scheduling, based on executing different applications in shared compute nodes, maximizing platform utilization. The framework includes a scalable monitoring tool that is able to analyze the platform’s compute node utilization. We also introduce an extension of CLARISSE, a middleware for data-staging coordination and control on large-scale HPC platforms that uses the information provided by the monitor in combination with application-level analysis to detect performance degradation in the running applications. This degradation, caused by the fact that the applications share the compute nodes and may compete for their resources, is avoided by means of dynamic application migration. A description of the architecture, as well as a practical evaluation of the proposal, shows significant performance improvements up to 20% in the makespan and 10% in energy consumption compared to a non-optimized execution.


Author(s):  
James D Stevens ◽  
Andreas Klöckner

The ability to model, analyze, and predict execution time of computations is an important building block that supports numerous efforts, such as load balancing, benchmarking, job scheduling, developer-guided performance optimization, and the automation of performance tuning for high performance, parallel applications. In today’s increasingly heterogeneous computing environment, this task must be accomplished efficiently across multiple architectures, including massively parallel coprocessors like GPUs, which are increasingly prevalent in the world’s fastest supercomputers. To address this challenge, we present an approach for constructing customizable, cross-machine performance models for GPU kernels, including a mechanism to automatically and symbolically gather performance-relevant kernel operation counts, a tool for formulating mathematical models using these counts, and a customizable parameterized collection of benchmark kernels used to calibrate models to GPUs in a black-box fashion. With this approach, we empower the user to manage trade-offs between model accuracy, evaluation speed, and generalizability. A user can define their own model and customize the calibration process, making it as simple or complex as desired, and as application-targeted or general as desired. As application examples of our approach, we demonstrate both linear and nonlinear models; these examples are designed to predict execution times for multiple variants of a particular computation: two matrix-matrix multiplication variants, four discontinuous Galerkin differentiation operation variants, and two 2D five-point finite difference stencil variants. For each variant, we present accuracy results on GPUs from multiple vendors and hardware generations. We view this highly user-customizable approach as a response to a central question arising in GPU performance modeling: how can we model GPU performance in a cost-explanatory fashion while maintaining accuracy, evaluation speed, portability, and ease of use, an attribute we believe precludes approaches requiring manual collection of kernel or hardware statistics.


2014 ◽  
Vol 615 ◽  
pp. 313-316
Author(s):  
Zai Liang Chen ◽  
Luo Hong Deng ◽  
Cong Jing

Designed new table for large floor boring and milling machine, used ANSYS to optimize the structure of the table as a whole. According to the contours of removable material the materials which can be removed, obtained the inner ribs layout of table and the sand holes location of rib plate. Dynamic optimization variables on basic ribs cell, studied the effect of steel lattice structure parameters influenced on the natural frequency of the lattices and the related parameter of lattices influenced on whole table, to get the ideal rib lattice structure after optimizing again. Optimized bench can reduce quality, increase rigidity and dynamic performance.


Sign in / Sign up

Export Citation Format

Share Document