On the Efficiency of OpenACC-aided GPU-Based FDTD Approach: Application to Lightning Electromagnetic Fields

Sajad Mohammadi; Hamidreza Karami; Mohammad Azadifar; Farhad Rachidi

doi:10.3390/app10072359

On the Efficiency of OpenACC-aided GPU-Based FDTD Approach: Application to Lightning Electromagnetic Fields

Applied Sciences ◽

10.3390/app10072359 ◽

2020 ◽

Vol 10 (7) ◽

pp. 2359

Author(s):

Sajad Mohammadi ◽

Hamidreza Karami ◽

Mohammad Azadifar ◽

Farhad Rachidi

Keyword(s):

Electromagnetic Fields ◽

Large Scale ◽

Electromagnetic Compatibility ◽

Programming Model ◽

Graphics Processing Unit ◽

Fdtd Method ◽

Computation Time ◽

Critical Factor ◽

Processing Unit ◽

Computational Performance

An open accelerator (OpenACC)-aided graphics processing unit (GPU)-based finite difference time domain (FDTD) method is presented for the first time for the 3D evaluation of lightning radiated electromagnetic fields along a complex terrain with arbitrary topography. The OpenACC directive-based programming model is used to enhance the computational performance, and the results are compared with those obtained by using a CPU-based model. It is shown that OpenACC GPUs can provide very accurate results, and they are more than 20 times faster than CPUs. The presented results support the use of OpenACC not only in relation to lightning electromagnetics problems, but also to large-scale realistic electromagnetic compatibility (EMC) applications in which computation time efficiency is a critical factor.

Download Full-text

Reduction of computation time using Graphics Processing Unit for the detection of a crack in a large scale concrete structure

The Journal of the Acoustical Society of America ◽

10.1121/1.4708928 ◽

2012 ◽

Vol 131 (4) ◽

pp. 3443-3443 ◽

Cited By ~ 1

Author(s):

Yuhei Katsurakawa ◽

Toyota Fujioka ◽

Yoshifumi Nagata ◽

Masato Abe

Keyword(s):

Concrete Structure ◽

Large Scale ◽

Graphics Processing Unit ◽

Computation Time ◽

Processing Unit ◽

Graphics Processing

Download Full-text

A Parallel-Computing Approach for Vector Road-Network Matching Using GPU Architecture

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi7120472 ◽

2018 ◽

Vol 7 (12) ◽

pp. 472 ◽

Cited By ~ 1

Author(s):

Bo Wan ◽

Lin Yang ◽

Shunping Zhou ◽

Run Wang ◽

Dezhi Wang ◽

...

Keyword(s):

Road Network ◽

Large Scale ◽

Graphics Processing Unit ◽

Road Networks ◽

Processing Unit ◽

Data Partition ◽

Matching Method ◽

The Road ◽

Central Processing ◽

Relaxation Matching

The road-network matching method is an effective tool for map integration, fusion, and update. Due to the complexity of road networks in the real world, matching methods often contain a series of complicated processes to identify homonymous roads and deal with their intricate relationship. However, traditional road-network matching algorithms, which are mainly central processing unit (CPU)-based approaches, may have performance bottleneck problems when facing big data. We developed a particle-swarm optimization (PSO)-based parallel road-network matching method on graphics-processing unit (GPU). Based on the characteristics of the two main stages (similarity computation and matching-relationship identification), data-partition and task-partition strategies were utilized, respectively, to fully use GPU threads. Experiments were conducted on datasets with 14 different scales. Results indicate that the parallel PSO-based matching algorithm (PSOM) could correctly identify most matching relationships with an average accuracy of 84.44%, which was at the same level as the accuracy of a benchmark—the probability-relaxation-matching (PRM) method. The PSOM approach significantly reduced the road-network matching time in dealing with large amounts of data in comparison with the PRM method. This paper provides a common parallel algorithm framework for road-network matching algorithms and contributes to integration and update of large-scale road-networks.

Download Full-text

Realtime cerebellum: A large-scale spiking network model of the cerebellum that runs in realtime using a graphics processing unit

Neural Networks ◽

10.1016/j.neunet.2013.01.019 ◽

2013 ◽

Vol 47 ◽

pp. 103-111 ◽

Cited By ~ 47

Author(s):

Tadashi Yamazaki ◽

Jun Igarashi

Keyword(s):

Network Model ◽

Large Scale ◽

Graphics Processing Unit ◽

Processing Unit ◽

Spiking Network ◽

Graphics Processing

Download Full-text

A lightweight approach to performance portability with targetDP

The International Journal of High Performance Computing Applications ◽

10.1177/1094342016682071 ◽

2016 ◽

Vol 32 (2) ◽

pp. 288-301

Author(s):

Alan Gray ◽

Kevin Stratford

Keyword(s):

Particle Physics ◽

Message Passing ◽

Graphics Processing Units ◽

High Performance ◽

Large Scale ◽

Message Passing Interface ◽

Graphics Processing Unit ◽

Processing Unit ◽

Performance Portability ◽

Graphics Processing

Leading high performance computing systems achieve their status through use of highly parallel devices such as NVIDIA graphics processing units or Intel Xeon Phi many-core CPUs. The concept of performance portability across such architectures, as well as traditional CPUs, is vital for the application programmer. In this paper we describe targetDP, a lightweight abstraction layer which allows grid-based applications to target data parallel hardware in a platform agnostic manner. We demonstrate the effectiveness of our pragmatic approach by presenting performance results for a complex fluid application (with which the model was co-designed), plus separate lattice quantum chromodynamics particle physics code. For each application, a single source code base is seen to achieve portable performance, as assessed within the context of the Roofline model. TargetDP can be combined with Message Passing Interface (MPI) to allow use on systems containing multiple nodes: we demonstrate this through provision of scaling results on traditional and graphics processing unit-accelerated large scale supercomputers.

Download Full-text

Prediction of Residual Stresses in a Multipass Pipe Weld by a Novel 3D Finite Element Approach

Volume 6B: Materials and Fabrication ◽

10.1115/pvp2018-85044 ◽

2018 ◽

Cited By ~ 1

Author(s):

Hui Huang ◽

Jian Chen ◽

Blair Carlson ◽

Hui-Ping Wang ◽

Paul Crooker ◽

...

Keyword(s):

Finite Element ◽

Residual Stresses ◽

High Performance ◽

Large Scale ◽

Graphics Processing Unit ◽

Computational Cost ◽

Three Dimensional ◽

Processing Unit ◽

Girth Welds ◽

Welding Processes

Due to enormous computation cost, current residual stress simulation of multipass girth welds are mostly performed using two-dimensional (2D) axisymmetric models. The 2D model can only provide limited estimation on the residual stresses by assuming its axisymmetric distribution. In this study, a highly efficient thermal-mechanical finite element code for three dimensional (3D) model has been developed based on high performance Graphics Processing Unit (GPU) computers. Our code is further accelerated by considering the unique physics associated with welding processes that are characterized by steep temperature gradient and a moving arc heat source. It is capable of modeling large-scale welding problems that cannot be easily handled by the existing commercial simulation tools. To demonstrate the accuracy and efficiency, our code was compared with a commercial software by simulating a 3D multi-pass girth weld model with over 1 million elements. Our code achieved comparable solution accuracy with respect to the commercial one but with over 100 times saving on computational cost. Moreover, the three-dimensional analysis demonstrated more realistic stress distribution that is not axisymmetric in hoop direction.

Download Full-text

Comparative study of the implementation of the Lagrange interpolation algorithm on GPU and CPU using CUDA to compute the density of a material at different temperatures

SHS Web of Conferences ◽

10.1051/shsconf/202111907002 ◽

2021 ◽

Vol 119 ◽

pp. 07002

Author(s):

Youness Rtal ◽

Abdelkader Hadjoudja

Keyword(s):

Parallel Computing ◽

Graphics Processing Units ◽

Lagrange Interpolation ◽

Polynomial Interpolation ◽

Programming Model ◽

Interpolation Method ◽

Processing Unit ◽

Central Processing ◽

Computational Performance ◽

Different Temperatures

Graphics Processing Units (GPUs) are microprocessors attached to graphics cards, which are dedicated to the operation of displaying and manipulating graphics data. Currently, such graphics cards (GPUs) occupy all modern graphics cards. In a few years, these microprocessors have become potent tools for massively parallel computing. Such processors are practical instruments that serve in developing several fields like image processing, video and audio encoding and decoding, the resolution of a physical system with one or more unknowns. Their advantages: faster processing and consumption of less energy than the power of the central processing unit (CPU). In this paper, we will define and implement the Lagrange polynomial interpolation method on GPU and CPU to calculate the sodium density at different temperatures Ti using the NVIDIA CUDA C parallel programming model. It can increase computational performance by harnessing the power of the GPU. The objective of this study is to compare the performance of the implementation of the Lagrange interpolation method on CPU and GPU processors and to deduce the efficiency of the use of GPUs for parallel computing.

Download Full-text

Using Unreal Engine to Visualize a Cosmological Volume

Universe ◽

10.3390/universe6100168 ◽

2020 ◽

Vol 6 (10) ◽

pp. 168

Author(s):

Christopher Marsden ◽

Francesco Shankar

Keyword(s):

Real Time ◽

Large Scale ◽

Graphics Processing Unit ◽

Large Scale Structure ◽

Two Dimensions ◽

Scale Structure ◽

Sloan Digital Sky Survey ◽

Processing Unit ◽

Large Scale Universe ◽

Time Projection

In this work we present “Astera’’, a cosmological visualization tool that renders a mock universe in real time using Unreal Engine 4. The large scale structure of the cosmic web is hard to visualize in two dimensions, and a 3D real time projection of this distribution allows for an unprecedented view of the large scale universe, with visually accurate galaxies placed in a dynamic 3D world. The underlying data are based on empirical relations assigned using results from N-Body dark matter simulations, and are matched to galaxies with similar morphologies and sizes, images of which are extracted from the Sloan Digital Sky Survey. Within Unreal Engine 4, galaxy images are transformed into textures and dynamic materials (with appropriate transparency) that are applied to static mesh objects with appropriate sizes and locations. To ensure excellent performance, these static meshes are “instanced’’ to utilize the full capabilities of a graphics processing unit. Additional components include a dynamic system for representing accelerated-time active galactic nuclei. The end result is a visually realistic large scale universe that can be explored by a user in real time, with accurate large scale structure. Astera is not yet ready for public release, but we are exploring options to make different versions of the code available for both research and outreach applications.

Download Full-text

A Multi-GPU Parallel Algorithm in Hypersonic Flow Computations

Mathematical Problems in Engineering ◽

10.1155/2019/2053156 ◽

2019 ◽

Vol 2019 ◽

pp. 1-15 ◽

Cited By ~ 3

Author(s):

Jianqi Lai ◽

Hua Li ◽

Zhengyu Tian ◽

Ye Zhang

Keyword(s):

Parallel Algorithm ◽

Hypersonic Flow ◽

Stokes Equations ◽

Programming Model ◽

Graphics Processing Unit ◽

Three Dimensional ◽

Flow Characteristics ◽

Time Discretization ◽

Processing Unit ◽

Complex Flow

Computational fluid dynamics (CFD) plays an important role in the optimal design of aircraft and the analysis of complex flow mechanisms in the aerospace domain. The graphics processing unit (GPU) has a strong floating-point operation capability and a high memory bandwidth in data parallelism, which brings great opportunities for CFD. A cell-centred finite volume method is applied to solve three-dimensional compressible Navier–Stokes equations on structured meshes with an upwind AUSM+UP numerical scheme for space discretization, and four-stage Runge–Kutta method is used for time discretization. Compute unified device architecture (CUDA) is used as a parallel computing platform and programming model for GPUs, which reduces the complexity of programming. The main purpose of this paper is to design an extremely efficient multi-GPU parallel algorithm based on MPI+CUDA to study the hypersonic flow characteristics. Solutions of hypersonic flow over an aerospace plane model are provided at different Mach numbers. The agreement between numerical computations and experimental measurements is favourable. Acceleration performance of the parallel platform is studied with single GPU, two GPUs, and four GPUs. For single GPU implementation, the speedup reaches 63 for the coarser mesh and 78 for the finest mesh. GPUs are better suited for compute-intensive tasks than traditional CPUs. For multi-GPU parallelization, the speedup of four GPUs reaches 77 for the coarser mesh and 147 for the finest mesh; this is far greater than the acceleration achieved by single GPU and two GPUs. It is prospective to apply the multi-GPU parallel algorithm to hypersonic flow computations.

Download Full-text

New BSP/CGM algorithms for spanning trees

The International Journal of High Performance Computing Applications ◽

10.1177/1094342018803672 ◽

2018 ◽

Vol 33 (3) ◽

pp. 444-461

Author(s):

Jucele França de Alencar Vasconcellos ◽

Edson Norberto Cáceres ◽

Henrique Mongelli ◽

Siang Wun Song ◽

Frank Dehne ◽

...

Keyword(s):

Parallel Algorithms ◽

Parallel Machines ◽

Spanning Trees ◽

Graphics Processing Unit ◽

Computation Time ◽

General Purpose ◽

Coarse Grained ◽

Processing Unit ◽

List Ranking ◽

Bulk Synchronous Parallel

Computing a spanning tree (ST) and a minimum ST (MST) of a graph are fundamental problems in graph theory and arise as a subproblem in many applications. In this article, we propose parallel algorithms to these problems. One of the steps of previous parallel MST algorithms relies on the heavy use of parallel list ranking which, though efficient in theory, is very time-consuming in practice. Using a different approach with a graph decomposition, we devised new parallel algorithms that do not make use of the list ranking procedure. We proved that our algorithms are correct, and for a graph [Formula: see text], [Formula: see text], and [Formula: see text], the algorithms can be executed on a Bulk Synchronous Parallel/Coarse Grained Multicomputer (BSP/CGM) model using [Formula: see text] communications rounds with [Formula: see text] computation time for each round. To show that our algorithms have good performance on real parallel machines, we have implemented them on graphics processing unit. The obtained speedups are competitive and showed that the BSP/CGM model is suitable for designing general purpose parallel algorithms.

Download Full-text

Self-Adjusting Variable Neighborhood Search Algorithm for Near-Optimal k-Means Clustering

Computation ◽

10.3390/computation8040090 ◽

2020 ◽

Vol 8 (4) ◽

pp. 90

Author(s):

Lev Kazakovtsev ◽

Ivan Rozhnov ◽

Aleksey Popov ◽

Elena Tovbis

Keyword(s):

Large Scale ◽

Variable Neighborhood Search ◽

Graphics Processing Unit ◽

Search Algorithm ◽

Fixed Time ◽

Exact Algorithms ◽

Neighborhood Search ◽

Data Sets ◽

Processing Unit ◽

Online Computation

The k-means problem is one of the most popular models in cluster analysis that minimizes the sum of the squared distances from clustered objects to the sought cluster centers (centroids). The simplicity of its algorithmic implementation encourages researchers to apply it in a variety of engineering and scientific branches. Nevertheless, the problem is proven to be NP-hard which makes exact algorithms inapplicable for large scale problems, and the simplest and most popular algorithms result in very poor values of the squared distances sum. If a problem must be solved within a limited time with the maximum accuracy, which would be difficult to improve using known methods without increasing computational costs, the variable neighborhood search (VNS) algorithms, which search in randomized neighborhoods formed by the application of greedy agglomerative procedures, are competitive. In this article, we investigate the influence of the most important parameter of such neighborhoods on the computational efficiency and propose a new VNS-based algorithm (solver), implemented on the graphics processing unit (GPU), which adjusts this parameter. Benchmarking on data sets composed of up to millions of objects demonstrates the advantage of the new algorithm in comparison with known local search algorithms, within a fixed time, allowing for online computation.

Download Full-text