Method of parallelization of loops for grid calculation problems on GPU accelerators

PROBLEMS IN PROGRAMMING ◽

10.15407/pp2017.01.059 ◽

2017 ◽

pp. 059-066

Author(s):

А.Yu. Doroshenko ◽

◽

O.G. Beketov ◽

Keyword(s):

Parallel Algorithm ◽

Graphics Processing Units ◽

Suggested Method ◽

Automated System ◽

Practical Implementation ◽

Heterogeneous Clusters ◽

Cuda Technology ◽

Nested Loops ◽

Simd Architecture ◽

Graphics Processing

The formal parallelizing transformation of a nest of calculation loop for SIMD architecture devices, particularly for graphics processing units applying CUDA technology and heterogeneous clusters is developed. Procedure of transition from sequential to parallel algorithm is described and illustrated. Serialization of data is applied to optimize processing of large volumes of data. The advantage of the suggested method is its applicability for transformation of data which volumes exceed the memory of operating device. The experiment is conducted to demonstrate feasibility of the proposed approach. Technique presented in the provides the basis for further practical implementation of the automated system for parallelizing of nested loops.

Download Full-text

TESLA GPUs versus MPI with OpenMP for the Forward Modeling of Gravity and Gravity Gradient of Large Prisms Ensemble

Journal of Applied Mathematics ◽

10.1155/2013/437357 ◽

2013 ◽

Vol 2013 ◽

pp. 1-15 ◽

Cited By ~ 4

Author(s):

Carlos Couder-Castañeda ◽

Carlos Ortiz-Alemán ◽

Mauricio Gabriel Orozco-del-Castillo ◽

Mauricio Nava-Flores

Keyword(s):

Parallel Computing ◽

Graphics Processing Units ◽

Forward Modeling ◽

Gravity Gradient ◽

Constant Density ◽

Gravitational Fields ◽

Design And Implementation ◽

Cuda Technology ◽

Performance Results ◽

Graphics Processing

An implementation with the CUDA technology in a single and in several graphics processing units (GPUs) is presented for the calculation of the forward modeling of gravitational fields from a tridimensional volumetric ensemble composed by unitary prisms of constant density. We compared the performance results obtained with the GPUs against a previous version coded in OpenMP with MPI, and we analyzed the results on both platforms. Today, the use of GPUs represents a breakthrough in parallel computing, which has led to the development of several applications with various applications. Nevertheless, in some applications the decomposition of the tasks is not trivial, as can be appreciated in this paper. Unlike a trivial decomposition of the domain, we proposed to decompose the problem by sets of prisms and use different memory spaces per processing CUDA core, avoiding the performance decay as a result of the constant calls to kernels functions which would be needed in a parallelization by observations points. The design and implementation created are the main contributions of this work, because the parallelization scheme implemented is not trivial. The performance results obtained are comparable to those of a small processing cluster.

Download Full-text

Passive Radar Parallel Processing Using General-Purpose Computing on Graphics Processing Units

International Journal of Electronics and Telecommunications ◽

10.1515/eletel-2015-0047 ◽

2015 ◽

Vol 61 (4) ◽

pp. 357-363 ◽

Cited By ~ 3

Author(s):

Karolina Szczepankiewicz ◽

Mateusz Malanowski ◽

Michał Szczepankiewicz

Keyword(s):

Parallel Processing ◽

Graphics Processing Units ◽

General Purpose ◽

Passive Radar ◽

Effective Implementation ◽

University Of Technology ◽

Fm Radio ◽

Cuda Technology ◽

Performance Results ◽

Graphics Processing

Abstract In the paper an implementation of signal processing chain for a passive radar is presented. The passive radar which was developed at the Warsaw University of Technology, uses FM radio and DVB-T television transmitters as ”illuminators of opportunity”. As the computational load associated with passive radar processing is very high, NVIDIA CUDA technology has been employed for effective implementation using parallel processing. The paper contains the description of the algorithms implementation and the performance results analysis.

Download Full-text

Efficient parallel algorithm for multiple sequence alignments with regular expression constraints on graphics processing units

International Journal of Computational Science and Engineering ◽

10.1504/ijcse.2014.058687 ◽

2014 ◽

Vol 9 (1/2) ◽

pp. 11 ◽

Cited By ~ 7

Author(s):

Chun Yuan Lin ◽

Yu Shiang Lin

Keyword(s):

Parallel Algorithm ◽

Graphics Processing Units ◽

Regular Expression ◽

Sequence Alignments ◽

Multiple Sequence ◽

Multiple Sequence Alignments ◽

Graphics Processing

Download Full-text

Parallel algorithm for solving Kepler’s equation on Graphics Processing Units: Application to analysis of Doppler exoplanet searches

New Astronomy ◽

10.1016/j.newast.2008.12.001 ◽

2009 ◽

Vol 14 (4) ◽

pp. 406-412 ◽

Cited By ~ 31

Author(s):

Eric B. Ford

Keyword(s):

Parallel Algorithm ◽

Graphics Processing Units ◽

Kepler's Equation ◽

Kepler’S Equation ◽

Graphics Processing

Download Full-text

Practical Examples of Automated Development of Efficient Parallel Programs

Advances in Systems Analysis, Software Engineering, and High Performance Computing - Formal and Adaptive Methods for Automation of Parallel Programs Construction ◽

10.4018/978-1-5225-9384-3.ch006 ◽

2021 ◽

pp. 180-216

Keyword(s):

Graphics Processing Units ◽

Dynamic Models ◽

Multicore Processors ◽

Central Processing ◽

Design And Synthesis ◽

Cuda Technology ◽

Design Generation ◽

Graphics Processing ◽

Rewriting Rules ◽

Target Platform

In this chapter, some examples of application of the developed software tools for design, generation, transformation, and optimization of programs for multicore processors and graphics processing units are considered. In particular, the algebra-algorithmic-integrated toolkit for design and synthesis of programs (IDS) and the rewriting rules system TermWare.NET are applied for design and parallelization of programs for multicore central processing units. The developed algebra-dynamic models and the rewriting rules toolkit are used for parallelization and optimization of programs for NVIDIA GPUs supporting the CUDA technology. The TuningGenie framework is applied for parallel program auto-tuning: optimization of sorting, Brownian motion simulation, and meteorological forecasting programs to a target platform. The parallelization of Fortran programs using the rewriting rules technique on sample problems in the field of quantum chemistry is examined.

Download Full-text

Parallel forming of preconditioners based on the approximation of the Sherman-Morrison inversion formula

Numerical Methods and Programming (Vychislitel'nye Metody i Programmirovanie) ◽

10.26089/nummet.v16r109 ◽

2015 ◽

pp. 86-93

Author(s):

А.К. Новиков ◽

C.П. Копысов ◽

Н.С. Недожогин

Keyword(s):

Parallel Algorithm ◽

Conjugate Gradient ◽

Graphics Processing Units ◽

Inversion Formula ◽

Matrix Approximation ◽

The Matrix ◽

New Form ◽

Graphics Processing ◽

Matrix Vector ◽

Parallelization Efficiency

Исследуются возможности ускорения предобусловленных методов бисопряженных градиентов (BiCGStab, Bi-Conjugate Gradient Stabilized) с предобусловливателем на основе аппроксимации обращения матрицы по формуле Шермана-Моррисона. Рассмотрена новая форма параллельного алгоритма, использующая матрично-векторные произведения при формирования матриц предобусловливателя. Показана эффективность распараллеливания наиболее ресурсоемких операций этого предобусловливателя на графических процессорах. Acceleration of preconditioned bi-conjugate gradient stabilized (BiCGStab) methods with preconditioners based on the matrix approximation by the Sherman-Morrison inversion formula is studied. A new form of the parallel algorithm using matrix-vector products to generate preconditioning matrices is proposed. A parallelization efficiency of the most resource-intensive operations of such preconditioners on multi-core central and graphics processing units (CPUs and GPUs) is shown.

Download Full-text

Programming for GPUs

Data Parallel C++ ◽

10.1007/978-1-4842-5574-2_15 ◽

2020 ◽

pp. 353-385

Author(s):

James Reinders ◽

Ben Ashbaugh ◽

James Brodman ◽

Michael Kinsner ◽

John Pennycook ◽

...

Keyword(s):

Parallel Algorithm ◽

Graphics Processing Units ◽

General Purpose ◽

Specialized Hardware ◽

Graphics Processing

Abstract Over the last few decades, Graphics Processing Units (GPUs) have evolved from specialized hardware devices capable of drawing images on a screen to general-purpose devices capable of executing complex parallel kernels. Nowadays, nearly every computer includes a GPU alongside a traditional CPU, and many programs may be accelerated by offloading part of a parallel algorithm from the CPU to the GPU.

Download Full-text

High performance computing on graphics processing units

Pollack Periodica ◽

10.1556/pollack.3.2008.2.3 ◽

2008 ◽

Vol 3 (2) ◽

pp. 27-34 ◽

Cited By ~ 2

Author(s):

Balázs Tukora ◽

Tibor Szalay

Keyword(s):

High Performance Computing ◽

Graphics Processing Units ◽

High Performance ◽

Graphics Processing ◽

Performance Computing

Download Full-text

Parallel Option Pricing with Fourier Space Time-Stepping Method on Graphics Processing Units

SSRN Electronic Journal ◽

10.2139/ssrn.1020207 ◽

2007 ◽

Cited By ~ 1

Author(s):

Vladimir Surkov

Keyword(s):

Option Pricing ◽

Graphics Processing Units ◽

Space Time ◽

Fourier Space ◽

Time Stepping ◽

Graphics Processing

Download Full-text

Improving the Efficiency and the Accuracy of 2D Gel Electrophoresis Spot Detection Using Graphics Processing Units

Current Bioinformatics ◽

10.2174/1574893612666170725141905 ◽

2018 ◽

Vol 13 (2) ◽

pp. 193-206 ◽

Cited By ~ 1

Author(s):

Marwa K. Elteir ◽

Shaheera A. Rashwan ◽

Ashraf A. Khalil

Keyword(s):

Gel Electrophoresis ◽

Graphics Processing Units ◽

2D Gel Electrophoresis ◽

Spot Detection ◽

2D Gel ◽

Graphics Processing

Download Full-text