A programming model and runtime system for approximation-aware heterogeneous computing

RCHC: A Holistic Runtime System for Concurrent Heterogeneous Computing

2016 45th International Conference on Parallel Processing (ICPP) ◽

10.1109/icpp.2016.31 ◽

2016 ◽

Cited By ~ 2

Author(s):

Jinsu Park ◽

Woongki Baek

Keyword(s):

Heterogeneous Computing ◽

Runtime System

Download Full-text

Evaluation of Portable Acceleration Solutions for LArTPC Simulation Using Wire-Cell Toolkit

EPJ Web of Conferences ◽

10.1051/epjconf/202125103032 ◽

2021 ◽

Vol 251 ◽

pp. 03032

Author(s):

Haiwang Yu ◽

Zhihua Dong ◽

Kyle Knoepfel ◽

Meifeng Lin ◽

Brett Viren ◽

...

Keyword(s):

Heterogeneous Computing ◽

Programming Model ◽

Liquid Argon ◽

Simulation Code ◽

Computationally Intensive ◽

Physics Model ◽

Specialized Hardware ◽

Neutrino Experiments ◽

High Level ◽

Time Projection

The Liquid Argon Time Projection Chamber (LArTPC) technology plays an essential role in many current and future neutrino experiments. Accurate and fast simulation is critical to developing efficient analysis algorithms and precise physics model projections. The speed of simulation becomes more important as Deep Learning algorithms are getting more widely used in LArTPC analysis and their training requires a large simulated dataset. Heterogeneous computing is an efficient way to delegate computationally intensive tasks to specialized hardware. However, as the landscape of compute accelerators quickly evolves, it becomes increasingly difficult to manually adapt the code to the latest hardware or software environments. A solution which is portable to multiple hardware architectures without substantially compromising performance would thus be very beneficial, especially for long-term projects such as the LArTPC simulations. In search of a portable, scalable and maintainable software solution for LArTPC simulations, we have started to explore high-level portable programming frameworks that support several hardware backends. In this paper, we present our experience porting the LArTPC simulation code in the Wire-Cell Toolkit to NVIDIA GPUs, first with the CUDA programming model and then with a portable library called Kokkos. Preliminary performance results on NVIDIA V100 GPUs and multi-core CPUs are presented, followed by a discussion of the factors affiecting the performance and plans for future improvements.

Download Full-text

Interaction with the User in the SAPFOR System

Russian Digital Libraries Journal ◽

10.26907/1562-5419-2021-24-1-157-183 ◽

2021 ◽

Vol 24 (1) ◽

pp. 157-183

Author(s):

Никита Андреевич Катаев

Keyword(s):

Parallel Programming ◽

Program Transformation ◽

Heterogeneous Computing ◽

Programming Model ◽

Parallel Programs ◽

Parallel Program ◽

Program Parallelization ◽

Parallel Programming Model ◽

The One ◽

High Level

Automation of parallel programming is important at any stage of parallel program development. These stages include profiling of the original program, program transformation, which allows us to achieve higher performance after program parallelization, and, finally, construction and optimization of the parallel program. It is also important to choose a suitable parallel programming model to express parallelism available in a program. On the one hand, the parallel programming model should be capable to map the parallel program to a variety of existing hardware resources. On the other hand, it should simplify the development of the assistant tools and it should allow the user to explore the parallel program the assistant tools generate in a semi-automatic way. The SAPFOR (System FOR Automated Parallelization) system combines various approaches to automation of parallel programming. Moreover, it allows the user to guide the parallelization if necessary. SAPFOR produces parallel programs according to the high-level DVMH parallel programming model which simplify the development of efficient parallel programs for heterogeneous computing clusters. This paper focuses on the approach to semi-automatic parallel programming, which SAPFOR implements. We discuss the architecture of the system and present the interactive subsystem which is useful to guide the SAPFOR through program parallelization. We used the interactive subsystem to parallelize programs from the NAS Parallel Benchmarks in a semi-automatic way. Finally, we compare the performance of manually written parallel programs with programs the SAPFOR system builds.

Download Full-text

CEML: a Coordinated Runtime System for Efficient Machine Learning on Heterogeneous Computing Systems

Euro-Par 2018: Parallel Processing - Lecture Notes in Computer Science ◽

10.1007/978-3-319-96983-1_55 ◽

2018 ◽

pp. 781-795

Author(s):

Jihoon Hyun ◽

Jinsu Park ◽

Kyu Yeun Kim ◽

Seongdae Yu ◽

Woongki Baek

Keyword(s):

Machine Learning ◽

Heterogeneous Computing ◽

Runtime System ◽

Computing Systems ◽

Efficient Machine ◽

Heterogeneous Computing Systems

Download Full-text

Programming Heterogeneous Clusters with Accelerators Using Object-Based Programming

Scientific Programming ◽

10.1155/2011/525717 ◽

2011 ◽

Vol 19 (1) ◽

pp. 47-62 ◽

Cited By ~ 7

Author(s):

David M. Kunzman ◽

Laxmikant V. Kalé

Keyword(s):

Molecular Dynamics ◽

High Performance ◽

Programming Model ◽

Runtime System ◽

Heterogeneous Cluster ◽

Heterogeneous Clusters ◽

Cache Hierarchies ◽

Object Based ◽

Application Data ◽

Performance Computing

Heterogeneous clusters that include accelerators have become more common in the realm of high performance computing because of the high GFlop/s rates such clusters are capable of achieving. However, heterogeneous clusters are typically considered hard to program as they usually require programmers to interleave architecture-specific code within application code. We have extended the Charm++ programming model and runtime system to support heterogeneous clusters (with host cores that differ in their architecture) that include accelerators. We are currently focusing on clusters that include commodity processors, Cell processors, and Larrabee devices. When our extensions are used to develop code, the resulting code is portable between various homogeneous and heterogeneous clusters that may or may not include accelerators. Using a simple example molecular dynamics (MD) code, we demonstrate our programming model extensions and runtime system modifications on a heterogeneous cluster comprised of Xeon and Cell processors. Even though there is no architecture-specific code in the example MD program, it is able to successfully make use of three core types, each with a different ISA (Xeon, PPE, SPE), three SIMD instruction extensions (SSE, AltiVec/VMX and the SPE's SIMD instructions), and two memory models (cache hierarchies and scratchpad memories) in a single execution. Our programming model extensions abstract away hardware complexities while our runtime system modifications automatically adjust application data to account for architectural differences between the various cores.

Download Full-text

PyCOMPSs: Parallel computational workflows in Python

The International Journal of High Performance Computing Applications ◽

10.1177/1094342015594678 ◽

2016 ◽

Vol 31 (1) ◽

pp. 66-82 ◽

Cited By ~ 28

Author(s):

Enric Tejedor ◽

Yolanda Becerra ◽

Guillem Alomar ◽

Anna Queralt ◽

Rosa M Badia ◽

...

Keyword(s):

Data Storage ◽

Message Passing ◽

Programming Model ◽

Runtime System ◽

Data Dependencies ◽

Parallel Tasks ◽

Complete Set ◽

Scientific Libraries ◽

Python Programming ◽

Big Data Storage

The use of the Python programming language for scientific computing has been gaining momentum in the last years. The fact that it is compact and readable and its complete set of scientific libraries are two important characteristics that favour its adoption. Nevertheless, Python still lacks a solution for easily parallelizing generic scripts on distributed infrastructures, since the current alternatives mostly require the use of APIs for message passing or are restricted to embarrassingly parallel computations. In that sense, this paper presents PyCOMPSs, a framework that facilitates the development of parallel computational workflows in Python. In this approach, the user programs her script in a sequential fashion and decorates the functions to be run as asynchronous parallel tasks. A runtime system is in charge of exploiting the inherent concurrency of the script, detecting the data dependencies between tasks and spawning them to the available resources. Furthermore, we show how this programming model can be built on top of a Big Data storage architecture, where the data stored in the backend is abstracted and accessed from the application in the form of persistent objects.

Download Full-text

A programming model and runtime system for significance-aware energy-efficient computing

Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming - PPoPP 2015 ◽

10.1145/2688500.2688546 ◽

2015 ◽

Cited By ~ 10

Author(s):

Vassilis Vassiliadis ◽

Konstantinos Parasyris ◽

Charalambos Chalios ◽

Christos D. Antonopoulos ◽

Spyros Lalis ◽

...

Keyword(s):

Energy Efficient ◽

Programming Model ◽

Runtime System ◽

Energy Efficient Computing

Download Full-text

Implementation of the EARTH programming model on SMP clusters: a multi-threaded language and runtime system

Concurrency and Computation Practice and Experience ◽

10.1002/cpe.729 ◽

2003 ◽

Vol 15 (9) ◽

pp. 821-844 ◽

Cited By ~ 3

Author(s):

G. Tremblay ◽

C. J. Morrone ◽

J. N. Amaral ◽

G. R. Gao

Keyword(s):

Programming Model ◽

Runtime System ◽

The Earth ◽

Smp Clusters

Download Full-text

Debugging Parallel Programs in DVM-System

Russian Digital Libraries Journal ◽

10.26907/1562-5419-2020-23-4-866-886 ◽

2020 ◽

Vol 23 (4) ◽

pp. 866-886

Author(s):

Vladimir Aleksandrovich Bakhtin ◽

Dmitry Aleksandrovich Zakharov ◽

Aleksandr Aleksandrovich Ermichev ◽

Victor Alekseevich Krukov

Keyword(s):

Parallel Programming ◽

Heterogeneous Computing ◽

Programming Model ◽

Parallel Programs ◽

Xeon Phi ◽

Intel Xeon Phi ◽

Parallel Programming Model ◽

Intel Xeon

DVM-system is designed for the development of parallel programs of scientific and technical calculations in the C-DVMH and Fortran-DVMH languages. These languages use a single DVMH-model of parallel programming model and are an extension of the standard C and Fortran languages with parallelism specifications in the form of compiler directives. The DVMH model makes it possible to create efficient parallel programs for heterogeneous computing clusters, in the nodes of which accelerators, graphic processors or Intel Xeon Phi coprocessors can be used as computing devices along with universal multi-core processors. The article describes the method of debugging parallel programs in DVM-system, as well as new features of DVM-debugger.

Download Full-text

A hierarchic task-based programming model for distributed heterogeneous computing

The International Journal of High Performance Computing Applications ◽

10.1177/1094342019845438 ◽

2019 ◽

Vol 33 (5) ◽

pp. 987-997 ◽

Cited By ~ 2

Author(s):

Jorge Ejarque ◽

Marc Domínguez ◽

Rosa M Badia

Keyword(s):

Lower Energy ◽

Heterogeneous Computing ◽

Programming Model ◽

Parallel Architectures ◽

Programming Models ◽

Complex Task ◽

Heterogeneous Platforms ◽

Storage Devices ◽

Computing Platforms ◽

And Storage

Distributed computing platforms are evolving to heterogeneous ecosystems with Clusters, Grids and Clouds introducing in its computing nodes, processors with different core architectures, accelerators (i.e. GPUs, FPGAs), as well as different memories and storage devices in order to achieve better performance with lower energy consumption. As a consequence of this heterogeneity, programming applications for these distributed heterogeneous platforms becomes a complex task. Additionally to the complexity of developing an application for distributed platforms, developers must also deal now with the complexity of the different computing devices inside the node. In this article, we present a programming model that aims to facilitate the development and execution of applications in current and future distributed heterogeneous parallel architectures. This programming model is based on the hierarchical composition of the COMP Superscalar and Omp Superscalar programming models that allow developers to implement infrastructure-agnostic applications. The underlying runtime enables applications to adapt to the infrastructure without the need of maintaining different versions of the code. Our programming model proposal has been evaluated on real platforms, in terms of heterogeneous resource usage, performance and adaptation.

Download Full-text