GPUs: High-performance Accelerators for Parallel Applications

Mark Silberstein

doi:10.1145/2618401

Matlab and Parallel Computing

Image Processing & Communications ◽

10.2478/v10248-012-0048-5 ◽

2012 ◽

Vol 17 (4) ◽

pp. 207-216 ◽

Cited By ~ 5

Author(s):

Magdalena Szymczyk ◽

Piotr Szymczyk

Keyword(s):

Image Processing ◽

Signal Processing ◽

Parallel Computing ◽

Distributed Computing ◽

Control Systems ◽

High Performance ◽

Parallel Applications ◽

Process Simulations ◽

Key Features ◽

Financial Process

Abstract The MATLAB is a technical computing language used in a variety of fields, such as control systems, image and signal processing, visualization, financial process simulations in an easy-to-use environment. MATLAB offers "toolboxes" which are specialized libraries for variety scientific domains, and a simplified interface to high-performance libraries (LAPACK, BLAS, FFTW too). Now MATLAB is enriched by the possibility of parallel computing with the Parallel Computing ToolboxTM and MATLAB Distributed Computing ServerTM. In this article we present some of the key features of MATLAB parallel applications focused on using GPU processors for image processing.

Download Full-text

Statistical and machine learning models for optimizing energy in parallel applications

The International Journal of High Performance Computing Applications ◽

10.1177/1094342019842915 ◽

2019 ◽

Vol 33 (6) ◽

pp. 1079-1097 ◽

Cited By ~ 2

Author(s):

Mark Endrei ◽

Chao Jin ◽

Minh Ngoc Dinh ◽

David Abramson ◽

Heidi Poxon ◽

...

Keyword(s):

Machine Learning ◽

Energy Efficiency ◽

High Performance ◽

Large Scale ◽

Energy Use ◽

Parallel Applications ◽

Learning Models ◽

Trade Off ◽

Time Required ◽

Machine Learning Models

Rising power costs and constraints are driving a growing focus on the energy efficiency of high performance computing systems. The unique characteristics of a particular system and workload and their effect on performance and energy efficiency are typically difficult for application users to assess and to control. Settings for optimum performance and energy efficiency can also diverge, so we need to identify trade-off options that guide a suitable balance between energy use and performance. We present statistical and machine learning models that only require a small number of runs to make accurate Pareto-optimal trade-off predictions using parameters that users can control. We study model training and validation using several parallel kernels and more complex workloads, including Algebraic Multigrid (AMG), Large-scale Atomic Molecular Massively Parallel Simulator, and Livermore Unstructured Lagrangian Explicit Shock Hydrodynamics. We demonstrate that we can train the models using as few as 12 runs, with prediction error of less than 10%. Our AMG results identify trade-off options that provide up to 45% improvement in energy efficiency for around 10% performance loss. We reduce the sample measurement time required for AMG by 90%, from 13 h to 74 min.

Download Full-text

Multi-Softcore Architecture on FPGA

International Journal of Reconfigurable Computing ◽

10.1155/2014/979327 ◽

2014 ◽

Vol 2014 ◽

pp. 1-13 ◽

Cited By ~ 4

Author(s):

Mouna Baklouti ◽

Mohamed Abid

Keyword(s):

High Performance ◽

Design Methodology ◽

Matrix Multiplication ◽

Rapid Prototype ◽

General Purpose ◽

Parallel Applications ◽

Multicore Systems ◽

Processor Core ◽

Nios Ii ◽

Wide Range

To meet the high performance demands of embedded multimedia applications, embedded systems are integrating multiple processing units. However, they are mostly based on custom-logic design methodology. Designing parallel multicore systems using available standards intellectual properties yet maintaining high performance is also a challenging issue. Softcore processors and field programmable gate arrays (FPGAs) are a cheap and fast option to develop and test such systems. This paper describes a FPGA-based design methodology to implement a rapid prototype of parametric multicore systems. A study of the viability of making the SoC using the NIOS II soft-processor core from Altera is also presented. The NIOS II features a general-purpose RISC CPU architecture designed to address a wide range of applications. The performance of the implemented architecture is discussed, and also some parallel applications are used for testing speedup and efficiency of the system. Experimental results demonstrate the performance of the proposed multicore system, which achieves better speedup than the GPU (29.5% faster for the FIR filter and 23.6% faster for the matrix-matrix multiplication).

Download Full-text

MUSCLE-HPC: A new high performance API to couple multiscale parallel applications

Future Generation Computer Systems ◽

10.1016/j.future.2016.08.009 ◽

2017 ◽

Vol 67 ◽

pp. 72-82 ◽

Cited By ~ 10

Author(s):

Mohamed Ben Belgacem ◽

Bastien Chopard

Keyword(s):

High Performance ◽

Parallel Applications

Download Full-text

The Sicilian Grid Infrastructure for High Performance Computing

International Journal of Distributed Systems and Technologies ◽

10.4018/jdst.2010090803 ◽

2010 ◽

Vol 1 (1) ◽

pp. 40-54 ◽

Cited By ~ 1

Author(s):

Carmelo Marcello Iacono-Manno ◽

Marco Fargetta ◽

Roberto Barbera ◽

Alberto Falzone ◽

Giuseppe Andronico ◽

...

Keyword(s):

High Performance Computing ◽

High Performance ◽

Parallel Applications ◽

Grid Infrastructure ◽

Scheduling Policy ◽

Computing Paradigm ◽

Regional Area ◽

Computer Fluid Dynamics ◽

Grid Infrastructures ◽

Performance Computing

The conjugation of High Performance Computing (HPC) and Grid paradigm with applications based on commercial software is one among the major challenges of today e-Infrastructures. Several research communities from either industry or academia need to run high parallel applications based on licensed software over hundreds of CPU cores; a satisfactory fulfillment of such requests is one of the keys for the penetration of this computing paradigm into the industry world and sustainability of Grid infrastructures. This problem has been tackled in the context of the PI2S2 project that created a regional e-Infrastructure in Sicily, the first in Italy over a regional area. Present article will describe the features added in order to integrate an HPC facility into the PI2S2 Grid infrastructure, the adoption of the InifiniBand low-latency net connection, the gLite middleware extended to support MPI/MPI2 jobs, the newly developed license server and the specific scheduling policy adopted. Moreover, it will show the results of some relevant use cases belonging to Computer Fluid-Dynamics (Fluent, OpenFOAM), Chemistry (GAMESS), Astro-Physics (Flash) and Bio-Informatics (ClustalW)).

Download Full-text

Service for parallel applications based on JINR cloud and HybriLIT resources

EPJ Web of Conferences ◽

10.1051/epjconf/201921407012 ◽

2019 ◽

Vol 214 ◽

pp. 07012 ◽

Cited By ~ 1

Author(s):

Nikita Balashov ◽

Maxim Bashashin ◽

Pavel Goncharov ◽

Ruslan Kuchumov ◽

Nikolay Kutovskiy ◽

...

Keyword(s):

High Performance ◽

Cloud Service ◽

Parallel Applications ◽

Cloud Infrastructure ◽

Modular Architecture ◽

Practical Applications ◽

Speed Up ◽

Scientific Results ◽

Computational Resources ◽

Performance Computing

Cloud computing has become a routine tool for scientists in many fields. The JINR cloud infrastructure provides JINR users with computational resources to perform various scientific calculations. In order to speed up achievements of scientific results the JINR cloud service for parallel applications has been developed. It consists of several components and implements a flexible and modular architecture which allows to utilize both more applications and various types of resources as computational backends. An example of using the Cloud&HybriLIT resources in scientific computing is the study of superconducting processes in the stacked long Josephson junctions (LJJ). The LJJ systems have undergone intensive research because of the perspective of practical applications in nano-electronics and quantum computing. In this contribution we generalize the experience in application of the Cloud&HybriLIT resources for high performance computing of physical characteristics in the LJJ system.

Download Full-text

Delivering high performance to parallel applications using advanced scheduling

Advances in Parallel Computing - Parallel Computing - Software Technology, Algorithms, Architectures and Applications ◽

10.1016/s0927-5452(04)80032-7 ◽

2004 ◽

pp. 233-240

Author(s):

N. Drosinos ◽

G. Goumas ◽

M. Athanasaki ◽

N. Koziris

Keyword(s):

High Performance ◽

Parallel Applications

Download Full-text

Pruners: Providing reproducibility for uncovering non-deterministic errors in runs on supercomputers

The International Journal of High Performance Computing Applications ◽

10.1177/1094342019834621 ◽

2019 ◽

Vol 33 (5) ◽

pp. 777-783 ◽

Cited By ~ 1

Author(s):

Kento Sato ◽

Ignacio Laguna ◽

Gregory L Lee ◽

Martin Schulz ◽

Christopher M Chambreau ◽

...

Keyword(s):

Real World ◽

High Performance ◽

Parallel Applications ◽

Program Execution ◽

World Production ◽

Application Development ◽

Full System ◽

Scientific Simulations ◽

Large Application

Large scientific simulations must be able to achieve the full-system potential of supercomputers. When they tap into high-performance features, however, a phenomenon known as non-determinism may be introduced in their program execution, which significantly hampers application development. Pruners is a new toolset to detect and remedy non-deterministic bugs and errors in large parallel applications. To show the capabilities of Pruners for large application development, we also demonstrate their early usage on real-world production applications.

Download Full-text

A Non-intrusive Methodology to Improve the Performance of Parallel Applications in High Performance Computing

2012 41st International Conference on Parallel Processing Workshops ◽

10.1109/icppw.2012.56 ◽

2012 ◽

Author(s):

Fernando H.P. Luz ◽

Denis Taniguchi ◽

Liria M. Sato

Keyword(s):

High Performance Computing ◽

High Performance ◽

Parallel Applications ◽

Performance Computing

Download Full-text

Study of parallel programming models on computer clusters with Intel MIC coprocessors

The International Journal of High Performance Computing Applications ◽

10.1177/1094342015580864 ◽

2015 ◽

Vol 31 (4) ◽

pp. 303-315 ◽

Cited By ~ 3

Author(s):

Miaoqing Huang ◽

Chenggang Lai ◽

Xuan Shi ◽

Zhijun Hao ◽

Haihang You

Keyword(s):

Parallel Programming ◽

High Performance ◽

Programming Model ◽

Fixed Number ◽

Parallel Applications ◽

Programming Models ◽

Communication Overhead ◽

Computer Clusters ◽

Parallel Programming Models ◽

Intel Mic

Coprocessors based on the Intel Many Integrated Core (MIC) Architecture have been adopted in many high-performance computer clusters. Typical parallel programming models, such as MPI and OpenMP, are supported on MIC processors to achieve the parallelism. In this work, we conduct a detailed study on the performance and scalability of the MIC processors under different programming models using the Beacon computer cluster. Our findings are as follows. (1) The native MPI programming model on the MIC processors is typically better than the offload programming model, which offloads the workload to MIC cores using OpenMP. (2) On top of the native MPI programming model, multithreading inside each MPI process can further improve the performance for parallel applications on computer clusters with MIC coprocessors. (3) Given a fixed number of MPI processes, it is a good strategy to schedule these MPI processes to as few MIC processors as possible to reduce the cross-processor communication overhead. (4) The hybrid MPI programming model, in which data processing is distributed to both MIC cores and CPU cores, can outperform the native MPI programming model.

Download Full-text