A Parallel Implementation of General Purpose Unstructured Flow Solver

The Graphics Processing Unit (GPU) originally designed for rendering graphics and which is difficult to program for other tasks, has since evolved into a device suitable for general-purpose computations. As a result graphics hardware has become progressively more attractive yielding unprecedented performance at a relatively low cost. Thus, it is the ideal candidate to accelerate a wide variety of data parallel tasks in many fields such as in Machine Learning (ML). As problems become more and more demanding, parallel implementations of learning algorithms are crucial for a useful application. In particular, the implementation of Neural Networks (NNs) in GPUs can significantly reduce the long training times during the learning process. In this paper we present a GPU parallel implementation of the Back-Propagation (BP) and Multiple Back-Propagation (MBP) algorithms, and describe the GPU kernels needed for this task. The results obtained on well-known benchmarks show faster training times and improved performances as compared to the implementation in traditional hardware, due to maximized floating-point throughput and memory bandwidth. Moreover, a preliminary GPU based Autonomous Training System (ATS) is developed which aims at automatically finding high-quality NNs-based solutions for a given problem.

Download Full-text

Simulation of Fire with a Gas Kinetic Scheme on Distributed GPGPU Architectures

Computation ◽

10.3390/computation8020050 ◽

2020 ◽

Vol 8 (2) ◽

pp. 50

Author(s):

Stephan Lenz ◽

Martin Geier ◽

Manfred Krafczyk

Keyword(s):

Graphics Processing Units ◽

Parallel Implementation ◽

Kinetic Scheme ◽

General Purpose ◽

Massively Parallel ◽

Small Scale ◽

Fire Dynamics ◽

Linear Interaction ◽

Turbulent Natural Convection ◽

Order Of Magnitude

The simulation of fire is a challenging task due to its occurrence on multiple space-time scales and the non-linear interaction of multiple physical processes. Current state-of-the-art software such as the Fire Dynamics Simulator (FDS) implements most of the required physics, yet a significant drawback of this implementation is its limited scalability on modern massively parallel hardware. The current paper presents a massively parallel implementation of a Gas Kinetic Scheme (GKS) on General Purpose Graphics Processing Units (GPGPUs) as a potential alternative modeling and simulation approach. The implementation is validated for turbulent natural convection against experimental data. Subsequently, it is validated for two simulations of fire plumes, including a small-scale table top setup and a fire on the scale of a few meters. We show that the present GKS achieves comparable accuracy to the results obtained by FDS. Yet, due to the parallel efficiency on dedicated hardware, our GKS implementation delivers a reduction of wall-clock times of more than an order of magnitude. This paper demonstrates the potential of explicit local schemes in massively parallel environments for the simulation of fire.

Download Full-text

USING METAPROGRAMMING TO PARALLELIZE FUNCTIONAL SPECIFICATIONS

Parallel Processing Letters ◽

10.1142/s0129626402000926 ◽

2002 ◽

Vol 12 (02) ◽

pp. 193-210 ◽

Cited By ~ 3

Author(s):

CHRISTOPH A. HERRMANN ◽

CHRISTIAN LENGAUER

Keyword(s):

Parallel Computing ◽

Programming Language ◽

High Performance ◽

Parallel Implementation ◽

General Purpose ◽

Functional Language ◽

Application Domain ◽

Domain Specific ◽

Domain Independent

Metaprogramming is a paradigm for enhancing a general-purpose programming language with features catering for a special-purpose application domain, without a need for a reimplementation of the language. In a staged compilation, the special-purpose features are translated and optimised by a domain-specific preprocessor, which hands over to the general-purpose compiler for translation of the domain-independent part of the program. The domain we work in is high-performance parallel computing. We use metaprogramming to enhance the functional language Haskell with features for the efficient, parallel implementation of certain computational patterns, called skeletons.

Download Full-text

A Parallel Flow Solver for Unsteady Multiple Blade Row Turbomachinery Simulations

Volume 1: Aircraft Engine; Marine; Turbomachinery; Microturbines and Small Turbomachinery ◽

10.1115/2001-gt-0348 ◽

2001 ◽

Cited By ~ 31

Author(s):

J. P. Chen ◽

W. R. Briley

Keyword(s):

Stokes Equations ◽

Parallel Implementation ◽

Parallel Flow ◽

Single Stage ◽

Implicit Time ◽

Phase Lag ◽

Serial Production ◽

Flow Solver ◽

Production Code ◽

Blade Row

A parallel flow solver has been developed to provide a turbomachinery flow simulation tool that extends the capabilities of a previous single–processor production code (TURBO) for unsteady turbomachinery flow analysis. The code solves the unsteady Reynolds-averaged Navier-Stokes equations with a k–ε turbulence model. The parallel code now includes most features of the serial production code, but is implemented in a portable, scalable form for distributed–memory parallel computers using MPI message passing. The parallel implementation employs domain decomposition and supports general multiblock grids with arbitrary grid–block connectivity. The solution algorithm is an iterative implicit time–accurate scheme with characteristics–based finite–volume spatial discretization. The Newton subiterations are solved using a concurrent block–Jacobi symmetric Gauss–Seidel (BJ–SGS) relaxation scheme. Unsteady blade–row interaction is treated either by simulating full or periodic sectors of blade–rows, or by solving within a single passage for each row using phase–lag and wake–blade interaction approximations at boundaries. A scalable dynamic sliding–interface algorithm is developed here, with an efficient parallel data communication between blade rows in relative motion. Parallel computations are given here for flat plate, single blade row (Rotor 67) and single stage (Stage 37) test cases, and these results are validated by comparison with corresponding results from the previously validated serial production code. Good speedup performance is demonstrated for the single–stage case with a relatively small grid of 600,000 points.

Download Full-text

Developing Extensible Lattice-Boltzmann Simulators for General-Purpose Graphics-Processing Units

Communications in Computational Physics ◽

10.4208/cicp.351011.260112s ◽

2013 ◽

Vol 13 (3) ◽

pp. 867-879 ◽

Cited By ~ 6

Author(s):

Stuart D. C. Walsh ◽

Martin O. Saar

Keyword(s):

Code Generation ◽

Lattice Boltzmann ◽

Graphics Processing Units ◽

Parallel Implementation ◽

General Purpose ◽

Lattice Boltzmann Simulation ◽

Lattice Boltzmann Simulations ◽

Gpu Architectures ◽

Automatic Code ◽

Graphics Processing

AbstractLattice-Boltzmann methods are versatile numerical modeling techniques capable of reproducing a wide variety of fluid-mechanical behavior. These methods are well suited to parallel implementation, particularly on the single-instruction multiple data (SIMD) parallel processing environments found in computer graphics processing units (GPUs).Although recent programming tools dramatically improve the ease with which GPUbased applications can be written, the programming environment still lacks the flexibility available to more traditional CPU programs. In particular, it may be difficult to develop modular and extensible programs that require variable on-device functionality with current GPU architectures.This paper describes a process of automatic code generation that overcomes these difficulties for lattice-Boltzmann simulations. It details the development of GPU-based modules for an extensible lattice-Boltzmann simulation package – LBHydra. The performance of the automatically generated code is compared to equivalent purposewritten codes for both single-phase,multiphase, andmulticomponent flows. The flexibility of the new method is demonstrated by simulating a rising, dissolving droplet moving through a porous medium with user generated lattice-Boltzmann models and subroutines.

Download Full-text

A General Purpose Parallel Block Structured Open Source Flow Solver

2012 Seventh International Conference on P2P, Parallel, Grid, Cloud and Internet Computing ◽

10.1109/3pgcic.2012.2 ◽

2012 ◽

Author(s):

Mariana Mendina ◽

Martin Draper ◽

Gabriel Narancio ◽

Gabriel Usera ◽

Ana Paula Kelm Soares

Keyword(s):

Open Source ◽

General Purpose ◽

Flow Solver ◽

Source Flow ◽

Block Structured

Download Full-text

AN OBJECT-ORIENTED TOOLBOX FOR ADAPTIVE NEURAL NETWORKS' IMPLEMENTATION

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213001000556 ◽

2001 ◽

Vol 10 (03) ◽

pp. 345-371

Author(s):

GEORGE D. MANIOUDAKIS ◽

SPIRIDON D. LIKOTHANASSIS

Keyword(s):

Neural Networks ◽

Learning Algorithm ◽

Parallel Implementation ◽

Learning Algorithms ◽

Object Oriented ◽

Network Size ◽

General Purpose ◽

Practical Applications ◽

Massively Parallel Processing ◽

Model Partitioning

Neural Networks are massively parallel processing systems, that require expensive and usually not available hardware, in order to be realized. Fortunately, the development of effective and accessible software, makes their simulation easy. Thus, various neural network's implementation tools exist in the market, which are oriented to the specific learning algorithm used. Furthermore, they can simulate only fixed size networks. In this work, we present some object-oriented techniques that have been used to defined some types of neuron and network objects, that can be used to realize, in a localized approach, some fast and powerful learning algorithms which combine results of the optimal filtering and the multi-model partitioning theory. Thus, one can build and implement intelligent learning algorithms that face both, the training as well as the on-line adjustment of the network size. Furthermore, the design methodology used, results to a system modeled as a collection of concurrent executable objects, making easy the parallel implementation. The whole design results in a general purpose tool box which is characterized by maintainability, reusability, and increased modularity. The provided features are shown by the presentation of some practical applications.

Download Full-text