Simulation of Fire with a Gas Kinetic Scheme on Distributed GPGPU Architectures

Stephan Lenz; Martin Geier; Manfred Krafczyk

doi:10.3390/computation8020050

Simulation of Fire with a Gas Kinetic Scheme on Distributed GPGPU Architectures

Computation ◽

10.3390/computation8020050 ◽

2020 ◽

Vol 8 (2) ◽

pp. 50

Author(s):

Stephan Lenz ◽

Martin Geier ◽

Manfred Krafczyk

Keyword(s):

Graphics Processing Units ◽

Parallel Implementation ◽

Kinetic Scheme ◽

General Purpose ◽

Massively Parallel ◽

Small Scale ◽

Fire Dynamics ◽

Linear Interaction ◽

Turbulent Natural Convection ◽

Order Of Magnitude

The simulation of fire is a challenging task due to its occurrence on multiple space-time scales and the non-linear interaction of multiple physical processes. Current state-of-the-art software such as the Fire Dynamics Simulator (FDS) implements most of the required physics, yet a significant drawback of this implementation is its limited scalability on modern massively parallel hardware. The current paper presents a massively parallel implementation of a Gas Kinetic Scheme (GKS) on General Purpose Graphics Processing Units (GPGPUs) as a potential alternative modeling and simulation approach. The implementation is validated for turbulent natural convection against experimental data. Subsequently, it is validated for two simulations of fire plumes, including a small-scale table top setup and a fire on the scale of a few meters. We show that the present GKS achieves comparable accuracy to the results obtained by FDS. Yet, due to the parallel efficiency on dedicated hardware, our GKS implementation delivers a reduction of wall-clock times of more than an order of magnitude. This paper demonstrates the potential of explicit local schemes in massively parallel environments for the simulation of fire.

Download Full-text

Developing Extensible Lattice-Boltzmann Simulators for General-Purpose Graphics-Processing Units

Communications in Computational Physics ◽

10.4208/cicp.351011.260112s ◽

2013 ◽

Vol 13 (3) ◽

pp. 867-879 ◽

Cited By ~ 6

Author(s):

Stuart D. C. Walsh ◽

Martin O. Saar

Keyword(s):

Code Generation ◽

Lattice Boltzmann ◽

Graphics Processing Units ◽

Parallel Implementation ◽

General Purpose ◽

Lattice Boltzmann Simulation ◽

Lattice Boltzmann Simulations ◽

Gpu Architectures ◽

Automatic Code ◽

Graphics Processing

AbstractLattice-Boltzmann methods are versatile numerical modeling techniques capable of reproducing a wide variety of fluid-mechanical behavior. These methods are well suited to parallel implementation, particularly on the single-instruction multiple data (SIMD) parallel processing environments found in computer graphics processing units (GPUs).Although recent programming tools dramatically improve the ease with which GPUbased applications can be written, the programming environment still lacks the flexibility available to more traditional CPU programs. In particular, it may be difficult to develop modular and extensible programs that require variable on-device functionality with current GPU architectures.This paper describes a process of automatic code generation that overcomes these difficulties for lattice-Boltzmann simulations. It details the development of GPU-based modules for an extensible lattice-Boltzmann simulation package – LBHydra. The performance of the automatically generated code is compared to equivalent purposewritten codes for both single-phase,multiphase, andmulticomponent flows. The flexibility of the new method is demonstrated by simulating a rising, dissolving droplet moving through a porous medium with user generated lattice-Boltzmann models and subroutines.

Download Full-text

uBlasCL: Architecture Agnostic Massively Parallel Linear Algebra System

Volume 2: 31st Computers and Information in Engineering Conference, Parts A and B ◽

10.1115/detc2011-48228 ◽

2011 ◽

Author(s):

Athanasios Iliopoulos ◽

John G. Michopoulos

Keyword(s):

Linear Algebra ◽

Graphics Processing Units ◽

Processing System ◽

Performance Testing ◽

General Purpose ◽

Massively Parallel ◽

Domain Specific Languages ◽

Central Processing ◽

Domain Specific ◽

Graphics Processing

The need for more efficient, more abstract and easier to use parallel programming interfaces has been recently intensified with the introduction and remarkable evolution of technologies such as the General Purpose Graphics Processing Units (GPG-PUs) and multi-core Central Processing Units (CPUs). In the present paper we present the introduction of the uBlasCL system as a Domain Specific Embedded Language within C++ that implements a Basic Linear Algebra Interface for OpenCL. The system is architecture agnostic, in the sense that it can be programmed independently of the targeted architecture, is massively parallel, and achieves efficiency that tracks well the increase in hardware performance advances. Our effort is based on the utilization of template metaprogramming and domain specific languages fundamentals, for developing a system that has the syntactic flexibility of a symbolic term processing system for expressing mathematics, and the semantic and executional power to exploit the parallelism offered by the hardware in an automated, transparent to the user, and efficiently mapped on the hardware manner. We also describe its relation to C++, template programming, domain specific languages and OpenCL. In the effort to develop uBlasCL we also developed a middleware library named CL++, as a convenient C++ interface to OpenCL. After the architectural and the implementation descriptions of the system, we present performance testing results demonstrating its potential power.

Download Full-text

Massively parallel implementation of cyclic LDPC codes on a general purpose graphics processing unit

2009 IEEE Workshop on Signal Processing Systems ◽

10.1109/sips.2009.5336268 ◽

2009 ◽

Cited By ~ 10

Author(s):

Hyunwoo Ji ◽

Junho Cho ◽

Wonyong Sung

Keyword(s):

Graphics Processing Unit ◽

Parallel Implementation ◽

Ldpc Codes ◽

General Purpose ◽

Massively Parallel ◽

Processing Unit ◽

Graphics Processing

Download Full-text

Simulations of Complex and Microscopic Models of Cardiac Electrophysiology Powered by Multi-GPU Platforms

Computational and Mathematical Methods in Medicine ◽

10.1155/2012/824569 ◽

2012 ◽

Vol 2012 ◽

pp. 1-13 ◽

Cited By ~ 11

Author(s):

Bruno Gouvêa de Barros ◽

Rafael Sachetto Oliveira ◽

Wagner Meira ◽

Marcelo Lobosco ◽

Rodrigo Weber dos Santos

Keyword(s):

Graphics Processing Units ◽

Cardiac Electrophysiology ◽

Cluster Computing ◽

Cardiac Tissue ◽

Parallel Implementation ◽

Cell Model ◽

Numerical Models ◽

Hybrid Approach ◽

Conduction Block ◽

General Purpose

Key aspects of cardiac electrophysiology, such as slow conduction, conduction block, and saltatory effects have been the research topic of many studies since they are strongly related to cardiac arrhythmia, reentry, fibrillation, or defibrillation. However, to reproduce these phenomena the numerical models need to use subcellular discretization for the solution of the PDEs and nonuniform, heterogeneous tissue electric conductivity. Due to the high computational costs of simulations that reproduce the fine microstructure of cardiac tissue, previous studies have considered tissue experiments of small or moderate sizes and used simple cardiac cell models. In this paper, we develop a cardiac electrophysiology model that captures the microstructure of cardiac tissue by using a very fine spatial discretization (8 μm) and uses a very modern and complex cell model based on Markov chains for the characterization of ion channel’s structure and dynamics. To cope with the computational challenges, the model was parallelized using a hybrid approach: cluster computing and GPGPUs (general-purpose computing on graphics processing units). Our parallel implementation of this model using a multi-GPU platform was able to reduce the execution times of the simulations from more than 6 days (on a single processor) to 21 minutes (on a small 8-node cluster equipped with 16 GPUs, i.e., 2 GPUs per node).

Download Full-text

Gappy Pattern Matching on GPUs for On-Demand Extraction of Hierarchical Translation Grammars

Transactions of the Association for Computational Linguistics ◽

10.1162/tacl_a_00124 ◽

2015 ◽

Vol 3 ◽

pp. 87-100

Author(s):

Hua He ◽

Jimmy Lin ◽

Adam Lopez

Keyword(s):

Hierarchical Models ◽

Graphics Processing Units ◽

General Purpose ◽

Practical Applications ◽

On Demand ◽

Mt Evaluation ◽

Order Of Magnitude ◽

Evaluation Dataset ◽

The Cost ◽

Graphics Processing

Grammars for machine translation can be materialized on demand by finding source phrases in an indexed parallel corpus and extracting their translations. This approach is limited in practical applications by the computational expense of online lookup and extraction. For phrase-based models, recent work has shown that on-demand grammar extraction can be greatly accelerated by parallelization on general purpose graphics processing units (GPUs), but these algorithms do not work for hierarchical models, which require matching patterns that contain gaps. We address this limitation by presenting a novel GPU algorithm for on-demand hierarchical grammar extraction that is at least an order of magnitude faster than a comparable CPU algorithm when processing large batches of sentences. In terms of end-to-end translation, with decoding on the CPU, we increase throughput by roughly two thirds on a standard MT evaluation dataset. The GPU necessary to achieve these improvements increases the cost of a server by about a third. We believe that GPU-based extraction of hierarchical grammars is an attractive proposition, particularly for MT applications that demand high throughput.

Download Full-text

Novel Dynamic Partial Reconfiguration Implementation of K-Means Clustering on FPGAs: Comparative Results with GPPs and GPUs

International Journal of Reconfigurable Computing ◽

10.1155/2012/135926 ◽

2012 ◽

Vol 2012 ◽

pp. 1-15 ◽

Cited By ~ 10

Author(s):

Hanaa M. Hussain ◽

Khaled Benkrid ◽

Ali Ebrahim ◽

Ahmet T. Erdogan ◽

Huseyin Seker

Keyword(s):

Graphics Processing Units ◽

Clustering Algorithm ◽

General Purpose ◽

Large Datasets ◽

Fpga Implementation ◽

Partial Reconfiguration ◽

Dynamic Partial Reconfiguration ◽

Field Programmable ◽

Order Of Magnitude ◽

Comparative Results

K-means clustering has been widely used in processing large datasets in many fields of studies. Advancement in many data collection techniques has been generating enormous amounts of data, leaving scientists with the challenging task of processing them. Using General Purpose Processors (GPPs) to process large datasets may take a long time; therefore many acceleration methods have been proposed in the literature to speed up the processing of such large datasets. In this work, a parameterized implementation of the K-means clustering algorithm in Field Programmable Gate Array (FPGA) is presented and compared with previous FPGA implementation as well as recent implementations on Graphics Processing Units (GPUs) and GPPs. The proposed FPGA has higher performance in terms of speedup over previous GPP and GPU implementations (two orders and one order of magnitude, resp.). In addition, the FPGA implementation is more energy efficient than GPP and GPU (615x and 31x, resp.). Furthermore, three novel implementations of the K-means clustering based on dynamic partial reconfiguration (DPR) are presented offering high degree of flexibility to dynamically reconfigure the FPGA. The DPR implementations achieved speedups in reconfiguration time between 4x to 15x.

Download Full-text

Non-Linear Interaction of MHD Waves with Small-Scale Ionospheric Irregularities

Telecommunications and Radio Engineering ◽

10.1615/telecomradeng.v61.i12.60 ◽

2004 ◽

Vol 61 (7-12) ◽

pp. 1055-1071

Author(s):

N. N. Gerasimova ◽

V. G. Sinitsin ◽

Yu. M. Yampolski

Keyword(s):

Ionospheric Irregularities ◽

Mhd Waves ◽

Small Scale ◽

Linear Interaction ◽

Non Linear

Download Full-text

Massively Parallel Implementation of Steered Molecular Dynamics in Tinker-HP: Comparisons of Polarizable and Non-Polarizable Simulations of Realistic Systems

10.26434/chemrxiv.7771112.v2 ◽

2019 ◽

Author(s):

Frédéric Célerse ◽

Louis Lagardere ◽

Étienne Derat ◽

Jean-Philip Piquemal

Keyword(s):

Molecular Dynamics ◽

Parallel Implementation ◽

Steered Molecular Dynamics ◽

Massively Parallel

This paper is dedicated to the massively parallel implementation of Steered Molecular Dynamics in the Tinker-HP softwtare. It allows for direct comparisons of polarizable and non-polarizable simulations of realistic systems.

Download Full-text

Massively Parallel Implementation of Steered Molecular Dynamics in Tinker-HP: Comparisons of Polarizable and Non-Polarizable Simulations of Realistic Systems

10.26434/chemrxiv.7771112 ◽

2019 ◽

Author(s):

Frédéric Célerse ◽

Louis Lagardere ◽

Étienne Derat ◽

Jean-Philip Piquemal

Keyword(s):

Molecular Dynamics ◽

Parallel Implementation ◽

Steered Molecular Dynamics ◽

Massively Parallel

Download Full-text

Large-eddy simulation of wall-bounded turbulent flow with high-order discrete unified gas-kinetic scheme

Advances in Aerodynamics ◽

10.1186/s42774-020-00051-w ◽

2020 ◽

Vol 2 (1) ◽

Author(s):

Rui Zhang ◽

Chengwen Zhong ◽

Sha Liu ◽

Congshan Zhuo

Keyword(s):

Turbulent Flow ◽

Turbulent Flows ◽

Parallel Implementation ◽

Kinetic Scheme ◽

Cavity Flow ◽

Equilibrium Distribution Function ◽

Third Order ◽

Two Stage ◽

Large Eddy ◽

Unified Gas Kinetic Scheme

AbstractIn this paper, we introduce the discrete Maxwellian equilibrium distribution function for incompressible flow and force term into the two-stage third-order Discrete Unified Gas-Kinetic Scheme (DUGKS) for simulating low-speed turbulent flows. The Wall-Adapting Local Eddy-viscosity (WALE) and Vreman sub-grid models for Large-Eddy Simulations (LES) of turbulent flows are coupled within the present framework. Meanwhile, the implicit LES are also presented to verify the effect of LES models. A parallel implementation strategy for the present framework is developed, and three canonical wall-bounded turbulent flow cases are investigated, including the fully developed turbulent channel flow at a friction Reynolds number (Re) about 180, the turbulent plane Couette flow at a friction Re number about 93 and lid-driven cubical cavity flow at a Re number of 12000. The turbulence statistics, including mean velocity, the r.m.s. fluctuations velocity, Reynolds stress, etc. are computed by the present approach. Their predictions match precisely with each other, and they are both in reasonable agreement with the benchmark data of DNS. Especially, the predicted flow physics of three-dimensional lid-driven cavity flow are consistent with the description from abundant literature. The present numerical results verify that the present two-stage third-order DUGKS-based LES method is capable for simulating inhomogeneous wall-bounded turbulent flows and getting reliable results with relatively coarse grids.

Download Full-text