A programming model and runtime system for approximation-aware heterogeneous computing

Author(s):  
Ioannis Parnassos ◽  
Nikolaos Bellas ◽  
Nikolaos Katsaros ◽  
Nikolaos Patsiatzis ◽  
Athanasios Gkaras ◽  
...  
2021 ◽  
Vol 251 ◽  
pp. 03032
Author(s):  
Haiwang Yu ◽  
Zhihua Dong ◽  
Kyle Knoepfel ◽  
Meifeng Lin ◽  
Brett Viren ◽  
...  

The Liquid Argon Time Projection Chamber (LArTPC) technology plays an essential role in many current and future neutrino experiments. Accurate and fast simulation is critical to developing efficient analysis algorithms and precise physics model projections. The speed of simulation becomes more important as Deep Learning algorithms are getting more widely used in LArTPC analysis and their training requires a large simulated dataset. Heterogeneous computing is an efficient way to delegate computationally intensive tasks to specialized hardware. However, as the landscape of compute accelerators quickly evolves, it becomes increasingly difficult to manually adapt the code to the latest hardware or software environments. A solution which is portable to multiple hardware architectures without substantially compromising performance would thus be very beneficial, especially for long-term projects such as the LArTPC simulations. In search of a portable, scalable and maintainable software solution for LArTPC simulations, we have started to explore high-level portable programming frameworks that support several hardware backends. In this paper, we present our experience porting the LArTPC simulation code in the Wire-Cell Toolkit to NVIDIA GPUs, first with the CUDA programming model and then with a portable library called Kokkos. Preliminary performance results on NVIDIA V100 GPUs and multi-core CPUs are presented, followed by a discussion of the factors affiecting the performance and plans for future improvements.


2021 ◽  
Vol 24 (1) ◽  
pp. 157-183
Author(s):  
Никита Андреевич Катаев

Automation of parallel programming is important at any stage of parallel program development. These stages include profiling of the original program, program transformation, which allows us to achieve higher performance after program parallelization, and, finally, construction and optimization of the parallel program. It is also important to choose a suitable parallel programming model to express parallelism available in a program. On the one hand, the parallel programming model should be capable to map the parallel program to a variety of existing hardware resources. On the other hand, it should simplify the development of the assistant tools and it should allow the user to explore the parallel program the assistant tools generate in a semi-automatic way. The SAPFOR (System FOR Automated Parallelization) system combines various approaches to automation of parallel programming. Moreover, it allows the user to guide the parallelization if necessary. SAPFOR produces parallel programs according to the high-level DVMH parallel programming model which simplify the development of efficient parallel programs for heterogeneous computing clusters. This paper focuses on the approach to semi-automatic parallel programming, which SAPFOR implements. We discuss the architecture of the system and present the interactive subsystem which is useful to guide the SAPFOR through program parallelization. We used the interactive subsystem to parallelize programs from the NAS Parallel Benchmarks in a semi-automatic way. Finally, we compare the performance of manually written parallel programs with programs the SAPFOR system builds.


2011 ◽  
Vol 19 (1) ◽  
pp. 47-62 ◽  
Author(s):  
David M. Kunzman ◽  
Laxmikant V. Kalé

Heterogeneous clusters that include accelerators have become more common in the realm of high performance computing because of the high GFlop/s rates such clusters are capable of achieving. However, heterogeneous clusters are typically considered hard to program as they usually require programmers to interleave architecture-specific code within application code. We have extended the Charm++ programming model and runtime system to support heterogeneous clusters (with host cores that differ in their architecture) that include accelerators. We are currently focusing on clusters that include commodity processors, Cell processors, and Larrabee devices. When our extensions are used to develop code, the resulting code is portable between various homogeneous and heterogeneous clusters that may or may not include accelerators. Using a simple example molecular dynamics (MD) code, we demonstrate our programming model extensions and runtime system modifications on a heterogeneous cluster comprised of Xeon and Cell processors. Even though there is no architecture-specific code in the example MD program, it is able to successfully make use of three core types, each with a different ISA (Xeon, PPE, SPE), three SIMD instruction extensions (SSE, AltiVec/VMX and the SPE's SIMD instructions), and two memory models (cache hierarchies and scratchpad memories) in a single execution. Our programming model extensions abstract away hardware complexities while our runtime system modifications automatically adjust application data to account for architectural differences between the various cores.


Author(s):  
Enric Tejedor ◽  
Yolanda Becerra ◽  
Guillem Alomar ◽  
Anna Queralt ◽  
Rosa M Badia ◽  
...  

The use of the Python programming language for scientific computing has been gaining momentum in the last years. The fact that it is compact and readable and its complete set of scientific libraries are two important characteristics that favour its adoption. Nevertheless, Python still lacks a solution for easily parallelizing generic scripts on distributed infrastructures, since the current alternatives mostly require the use of APIs for message passing or are restricted to embarrassingly parallel computations. In that sense, this paper presents PyCOMPSs, a framework that facilitates the development of parallel computational workflows in Python. In this approach, the user programs her script in a sequential fashion and decorates the functions to be run as asynchronous parallel tasks. A runtime system is in charge of exploiting the inherent concurrency of the script, detecting the data dependencies between tasks and spawning them to the available resources. Furthermore, we show how this programming model can be built on top of a Big Data storage architecture, where the data stored in the backend is abstracted and accessed from the application in the form of persistent objects.


2020 ◽  
Vol 23 (4) ◽  
pp. 866-886
Author(s):  
Vladimir Aleksandrovich Bakhtin ◽  
Dmitry Aleksandrovich Zakharov ◽  
Aleksandr Aleksandrovich Ermichev ◽  
Victor Alekseevich Krukov

DVM-system is designed for the development of parallel programs of scientific and technical calculations in the C-DVMH and Fortran-DVMH languages. These languages use a single DVMH-model of parallel programming model and are an extension of the standard C and Fortran languages with parallelism specifications in the form of compiler directives. The DVMH model makes it possible to create efficient parallel programs for heterogeneous computing clusters, in the nodes of which accelerators, graphic processors or Intel Xeon Phi coprocessors can be used as computing devices along with universal multi-core processors. The article describes the method of debugging parallel programs in DVM-system, as well as new features of DVM-debugger.


Author(s):  
Jorge Ejarque ◽  
Marc Domínguez ◽  
Rosa M Badia

Distributed computing platforms are evolving to heterogeneous ecosystems with Clusters, Grids and Clouds introducing in its computing nodes, processors with different core architectures, accelerators (i.e. GPUs, FPGAs), as well as different memories and storage devices in order to achieve better performance with lower energy consumption. As a consequence of this heterogeneity, programming applications for these distributed heterogeneous platforms becomes a complex task. Additionally to the complexity of developing an application for distributed platforms, developers must also deal now with the complexity of the different computing devices inside the node. In this article, we present a programming model that aims to facilitate the development and execution of applications in current and future distributed heterogeneous parallel architectures. This programming model is based on the hierarchical composition of the COMP Superscalar and Omp Superscalar programming models that allow developers to implement infrastructure-agnostic applications. The underlying runtime enables applications to adapt to the infrastructure without the need of maintaining different versions of the code. Our programming model proposal has been evaluated on real platforms, in terms of heterogeneous resource usage, performance and adaptation.


Sign in / Sign up

Export Citation Format

Share Document