Staged memory scheduling: Achieving high performance and scalability in heterogeneous systems

Abstract High-performance computing is at the heart of digital technology which allows to simulate complex physical phenomena. The current trend for hardware architectures is toward heterogeneous systems with multi-core CPUs accelerated by GPUs to get high computing power. The demand for fast solution of Geoscience simulations coupled with new computing architectures drives the need for challenging parallel algorithms. Such applications based on partial differential equations, requires to solve large and sparse linear system of equations. This work makes a step further in Matrix Powers Kernel (MPK) which is a crucial kernel in solving sparse linear systems using communication-avoiding methods. This class of methods deals with the degradation of performances observed beyond several nodes by decreasing the gap between the time necessary to perform the computations and the time needed to communicate the results. The proposed work consists of a new formulation for distributed MPK kernels for the cluster of GPUs where the pipeline communications could be overlapped by the computation. Also, appropriate data reorganization decreases the memory traffic between processors and accelerators and improves performance. The proposed structure is based on the separation of local and external components with different layers of interface nodes-due to the MPK algorithm-. The data is restructured in a way where all the data required by the neighbor process comes contiguously at the end, after the local one. Thanks to an assembly step, the contents of the messages for each neighbor are determined. Such data structure has a major impact on the efficiency of the solution, since it permits to design an appropriate communication scheme where the computation with local data can occur on the GPUs and the external ones on the CPUs. Moreover, it permits more efficient inter-process communication by an effective overlap of the communication by the computation in the asynchronous pipeline way. We validate our design through the test cases with different block matrices obtained from different reservoir simulations : fractured reservoir dual-medium, black-oil two phase-flow, and three phase-flow models. The experimental results demonstrate the performance of the proposed approach compared to state of the art. The proposed MPK running on several nodes of the GPU cluster provides a significant performance gain over equivalent Sparse Matrix Vector product (SpMV) which is already optimized and provides better scalability.

Download Full-text

Handling large data sets for high-performance embedded applications in heterogeneous systems-on-chip

Proceedings of the International Conference on Compilers, Architectures and Synthesis for Embedded Systems - CASES '16 ◽

10.1145/2968455.2968509 ◽

2016 ◽

Cited By ~ 4

Author(s):

Paolo Mantovani ◽

Emilio G. Cota ◽

Christian Pilato ◽

Giuseppe Di Guglielmo ◽

Luca P. Carloni

Keyword(s):

High Performance ◽

Heterogeneous Systems ◽

Large Data ◽

Large Data Sets ◽

Data Sets ◽

Systems On Chip ◽

On Chip ◽

Embedded Applications

Download Full-text

Scientific Computing With Python on High-Performance Heterogeneous Systems

Computing in Science & Engineering ◽

10.1109/mcse.2021.3088549 ◽

2021 ◽

Vol 23 (4) ◽

pp. 5-7

Author(s):

Lorena A. Barba ◽

Andreas Klockner ◽

Prabhu Ramachandran ◽

Rollin Thomas

Keyword(s):

High Performance ◽

Scientific Computing ◽

Heterogeneous Systems

Download Full-text

Fast 3D Integrated Circuit Placement Methodology using Merging Technique

Defence Science Journal ◽

10.14429/dsj.69.14410 ◽

2019 ◽

Vol 69 (3) ◽

pp. 217-222 ◽

Cited By ~ 1

Author(s):

Srinivas Sabbavarapu ◽

Amit Acharyya ◽

P. Balasubramanian ◽

C. Ramesh Reddy

Keyword(s):

Form Factor ◽

Low Power ◽

Integrated Circuit ◽

High Speed ◽

High Performance ◽

Three Dimensional ◽

Heterogeneous Systems ◽

Reduced Form ◽

Wire Length ◽

Ic Design

In the recent years the advancement in the field of microelectronics integrated circuit (IC) design technologies proved to be a boon for design and development of various advanced systems in-terms of its reduction in form factor, low power, high speed and with increased capacity to incorporate more designs. These systems provide phenomenal advantage for armoured fighting vehicle (AFV) design to develop miniaturised low power, high performance sub-systems. One such emerging high-end technology to be used to develop systems with high capabilities for AFVs is discussed in this paper. Three dimensional IC design is one of the emerging field used to develop high density heterogeneous systems in a reduced form factor. A novel grouping based partitioning and merge based placement (GPMP) methodology for 3D ICs to reduce through silicon vias (TSVs) count and placement time is proposed. Unlike state-of-the-art techniques, the proposed methodology does not suffer from initial overlap of cells during intra-layer placement which reduces the placement time. Connectivity based grouping and partitioning ensures less number of TSVs and merge based placement further reduces intra layer wire-length. The proposed GPMP methodology has been extensively against the IBMPLACE database and performance has been compared with the latest techniques resulting in 12 per cent improvement in wire-length, 13 per cent reduction in TSV and 1.1x improvement in placement time.

Download Full-text

RESEARCH OF THE APPLICATION EFFICIENCY OF DIFFERENT CONSTRUCTIONS OF FLOW CAVITATION MIXERS

Thermophysics and Thermal Power Engineering ◽

10.31472/ttpe.1.2019.10 ◽

2018 ◽

Vol 41 (1) ◽

pp. 74-81

Author(s):

A.A. Makarenko

Keyword(s):

High Performance ◽

Building Materials ◽

Diffusion Processes ◽

Specific Energy Consumption ◽

Heterogeneous Systems ◽

Hydrodynamic Cavitation ◽

High Dispersion ◽

Distinctive Features ◽

Technological Processes ◽

Liquid Systems

The material which is accumulated for today about the application of hydrodynamic cavitation in technological processes makes it possible to determine the perspective areas of its use - in the processes of mass transfer, mixing, dissolution, dispersion and emulsification in the processing of liquid heterogeneous systems and the creation of modern energy-saving technologies. The purpose of this article is to study the effectiveness of the use of different designs of flow cavitation mixers for the treatment of liquid heterogeneous disperse systems and to identify the main industries for their use. Cavitational apparatus can be effectively used in performing such technological processes as mixing difficult to mix liquids, dissolving solids in liquids, obtaining stable, multicomponent high dispersion emulsions without using of stabilizers, dispersing suspensions in liquid-liquid systems, accelerating extraction and diffusion, and many others. Hydrodynamic cavitation can be used in technologies to produce lubricants, fuel materials, varnishes and paints, building materials, detergents, etc. Different designs of cavitation devices allow to obtain different forms of cavitation, different ways of obtaining it or their combination, depending on the purpose and field of use. Apparatus whose action is based on the use of hydrodynamic cavitation represent an effective equipment that accelerates technological processes in liquid media while significantly reducing the specific energy consumption. The structures of hydrodynamic cavitation devices provide multiple rearrangements of the velocity field and change the direction of fluid flow and mix components. The main feature of the devices is small dimensions with high performance. Distinctive features of this type of equipment are ensuring the continuity of the chemical-technological process and its high intensification, the possibility of realizing large quantities of deformations and strain of displacement, intensive hydrodynamic and cavitation effects, which results in high-quality mixing of components, intensification of diffusion processes, simplicity and reliability of hardware design. The economic efficiency of the application of hydrodynamic cavitational apparatus is determined by the low metal capacity of the equipment, low maintenance and operation costs compared with capacitive mixing equipment.

Download Full-text

Analytical Performance Estimation for Large-Scale Reconfigurable Dataflow Platforms

ACM Transactions on Reconfigurable Technology and Systems ◽

10.1145/3452742 ◽

2021 ◽

Vol 14 (3) ◽

pp. 1-21

Author(s):

Ryota Yasudo ◽

José G. F. Coutinho ◽

Ana-Lucia Varbanescu ◽

Wayne Luk ◽

Hideharu Amano ◽

...

Keyword(s):

High Performance Computing ◽

High Performance ◽

Large Scale ◽

Heterogeneous Systems ◽

Performance Estimation ◽

Performance Impact ◽

Accurate Performance ◽

Computing Platforms ◽

Reduced Power Consumption ◽

Performance Computing

Next-generation high-performance computing platforms will handle extreme data- and compute-intensive problems that are intractable with today’s technology. A promising path in achieving the next leap in high-performance computing is to embrace heterogeneity and specialised computing in the form of reconfigurable accelerators such as FPGAs, which have been shown to speed up compute-intensive tasks with reduced power consumption. However, assessing the feasibility of large-scale heterogeneous systems requires fast and accurate performance prediction. This article proposes Performance Estimation for Reconfigurable Kernels and Systems (PERKS), a novel performance estimation framework for reconfigurable dataflow platforms. PERKS makes use of an analytical model with machine and application parameters for predicting the performance of multi-accelerator systems and detecting their bottlenecks. Model calibration is automatic, making the model flexible and usable for different machine configurations and applications, including hypothetical ones. Our experimental results show that PERKS can predict the performance of current workloads on reconfigurable dataflow platforms with an accuracy above 91%. The results also illustrate how the modelling scales to large workloads, and how performance impact of architectural features can be estimated in seconds.

Download Full-text