scholarly journals Simulation of Fire with a Gas Kinetic Scheme on Distributed GPGPU Architectures

Computation ◽  
2020 ◽  
Vol 8 (2) ◽  
pp. 50
Author(s):  
Stephan Lenz ◽  
Martin Geier ◽  
Manfred Krafczyk

The simulation of fire is a challenging task due to its occurrence on multiple space-time scales and the non-linear interaction of multiple physical processes. Current state-of-the-art software such as the Fire Dynamics Simulator (FDS) implements most of the required physics, yet a significant drawback of this implementation is its limited scalability on modern massively parallel hardware. The current paper presents a massively parallel implementation of a Gas Kinetic Scheme (GKS) on General Purpose Graphics Processing Units (GPGPUs) as a potential alternative modeling and simulation approach. The implementation is validated for turbulent natural convection against experimental data. Subsequently, it is validated for two simulations of fire plumes, including a small-scale table top setup and a fire on the scale of a few meters. We show that the present GKS achieves comparable accuracy to the results obtained by FDS. Yet, due to the parallel efficiency on dedicated hardware, our GKS implementation delivers a reduction of wall-clock times of more than an order of magnitude. This paper demonstrates the potential of explicit local schemes in massively parallel environments for the simulation of fire.

2013 ◽  
Vol 13 (3) ◽  
pp. 867-879 ◽  
Author(s):  
Stuart D. C. Walsh ◽  
Martin O. Saar

AbstractLattice-Boltzmann methods are versatile numerical modeling techniques capable of reproducing a wide variety of fluid-mechanical behavior. These methods are well suited to parallel implementation, particularly on the single-instruction multiple data (SIMD) parallel processing environments found in computer graphics processing units (GPUs).Although recent programming tools dramatically improve the ease with which GPUbased applications can be written, the programming environment still lacks the flexibility available to more traditional CPU programs. In particular, it may be difficult to develop modular and extensible programs that require variable on-device functionality with current GPU architectures.This paper describes a process of automatic code generation that overcomes these difficulties for lattice-Boltzmann simulations. It details the development of GPU-based modules for an extensible lattice-Boltzmann simulation package – LBHydra. The performance of the automatically generated code is compared to equivalent purposewritten codes for both single-phase,multiphase, andmulticomponent flows. The flexibility of the new method is demonstrated by simulating a rising, dissolving droplet moving through a porous medium with user generated lattice-Boltzmann models and subroutines.


Author(s):  
Athanasios Iliopoulos ◽  
John G. Michopoulos

The need for more efficient, more abstract and easier to use parallel programming interfaces has been recently intensified with the introduction and remarkable evolution of technologies such as the General Purpose Graphics Processing Units (GPG-PUs) and multi-core Central Processing Units (CPUs). In the present paper we present the introduction of the uBlasCL system as a Domain Specific Embedded Language within C++ that implements a Basic Linear Algebra Interface for OpenCL. The system is architecture agnostic, in the sense that it can be programmed independently of the targeted architecture, is massively parallel, and achieves efficiency that tracks well the increase in hardware performance advances. Our effort is based on the utilization of template metaprogramming and domain specific languages fundamentals, for developing a system that has the syntactic flexibility of a symbolic term processing system for expressing mathematics, and the semantic and executional power to exploit the parallelism offered by the hardware in an automated, transparent to the user, and efficiently mapped on the hardware manner. We also describe its relation to C++, template programming, domain specific languages and OpenCL. In the effort to develop uBlasCL we also developed a middleware library named CL++, as a convenient C++ interface to OpenCL. After the architectural and the implementation descriptions of the system, we present performance testing results demonstrating its potential power.


2012 ◽  
Vol 2012 ◽  
pp. 1-13 ◽  
Author(s):  
Bruno Gouvêa de Barros ◽  
Rafael Sachetto Oliveira ◽  
Wagner Meira ◽  
Marcelo Lobosco ◽  
Rodrigo Weber dos Santos

Key aspects of cardiac electrophysiology, such as slow conduction, conduction block, and saltatory effects have been the research topic of many studies since they are strongly related to cardiac arrhythmia, reentry, fibrillation, or defibrillation. However, to reproduce these phenomena the numerical models need to use subcellular discretization for the solution of the PDEs and nonuniform, heterogeneous tissue electric conductivity. Due to the high computational costs of simulations that reproduce the fine microstructure of cardiac tissue, previous studies have considered tissue experiments of small or moderate sizes and used simple cardiac cell models. In this paper, we develop a cardiac electrophysiology model that captures the microstructure of cardiac tissue by using a very fine spatial discretization (8 μm) and uses a very modern and complex cell model based on Markov chains for the characterization of ion channel’s structure and dynamics. To cope with the computational challenges, the model was parallelized using a hybrid approach: cluster computing and GPGPUs (general-purpose computing on graphics processing units). Our parallel implementation of this model using a multi-GPU platform was able to reduce the execution times of the simulations from more than 6 days (on a single processor) to 21 minutes (on a small 8-node cluster equipped with 16 GPUs, i.e., 2 GPUs per node).


Author(s):  
Hua He ◽  
Jimmy Lin ◽  
Adam Lopez

Grammars for machine translation can be materialized on demand by finding source phrases in an indexed parallel corpus and extracting their translations. This approach is limited in practical applications by the computational expense of online lookup and extraction. For phrase-based models, recent work has shown that on-demand grammar extraction can be greatly accelerated by parallelization on general purpose graphics processing units (GPUs), but these algorithms do not work for hierarchical models, which require matching patterns that contain gaps. We address this limitation by presenting a novel GPU algorithm for on-demand hierarchical grammar extraction that is at least an order of magnitude faster than a comparable CPU algorithm when processing large batches of sentences. In terms of end-to-end translation, with decoding on the CPU, we increase throughput by roughly two thirds on a standard MT evaluation dataset. The GPU necessary to achieve these improvements increases the cost of a server by about a third. We believe that GPU-based extraction of hierarchical grammars is an attractive proposition, particularly for MT applications that demand high throughput.


2012 ◽  
Vol 2012 ◽  
pp. 1-15 ◽  
Author(s):  
Hanaa M. Hussain ◽  
Khaled Benkrid ◽  
Ali Ebrahim ◽  
Ahmet T. Erdogan ◽  
Huseyin Seker

K-means clustering has been widely used in processing large datasets in many fields of studies. Advancement in many data collection techniques has been generating enormous amounts of data, leaving scientists with the challenging task of processing them. Using General Purpose Processors (GPPs) to process large datasets may take a long time; therefore many acceleration methods have been proposed in the literature to speed up the processing of such large datasets. In this work, a parameterized implementation of the K-means clustering algorithm in Field Programmable Gate Array (FPGA) is presented and compared with previous FPGA implementation as well as recent implementations on Graphics Processing Units (GPUs) and GPPs. The proposed FPGA has higher performance in terms of speedup over previous GPP and GPU implementations (two orders and one order of magnitude, resp.). In addition, the FPGA implementation is more energy efficient than GPP and GPU (615x and 31x, resp.). Furthermore, three novel implementations of the K-means clustering based on dynamic partial reconfiguration (DPR) are presented offering high degree of flexibility to dynamically reconfigure the FPGA. The DPR implementations achieved speedups in reconfiguration time between 4x to 15x.


2004 ◽  
Vol 61 (7-12) ◽  
pp. 1055-1071
Author(s):  
N. N. Gerasimova ◽  
V. G. Sinitsin ◽  
Yu. M. Yampolski

2019 ◽  
Author(s):  
Frédéric Célerse ◽  
Louis Lagardere ◽  
Étienne Derat ◽  
Jean-Philip Piquemal

This paper is dedicated to the massively parallel implementation of Steered Molecular Dynamics in the Tinker-HP softwtare. It allows for direct comparisons of polarizable and non-polarizable simulations of realistic systems.


2019 ◽  
Author(s):  
Frédéric Célerse ◽  
Louis Lagardere ◽  
Étienne Derat ◽  
Jean-Philip Piquemal

This paper is dedicated to the massively parallel implementation of Steered Molecular Dynamics in the Tinker-HP softwtare. It allows for direct comparisons of polarizable and non-polarizable simulations of realistic systems.


2020 ◽  
Vol 2 (1) ◽  
Author(s):  
Rui Zhang ◽  
Chengwen Zhong ◽  
Sha Liu ◽  
Congshan Zhuo

AbstractIn this paper, we introduce the discrete Maxwellian equilibrium distribution function for incompressible flow and force term into the two-stage third-order Discrete Unified Gas-Kinetic Scheme (DUGKS) for simulating low-speed turbulent flows. The Wall-Adapting Local Eddy-viscosity (WALE) and Vreman sub-grid models for Large-Eddy Simulations (LES) of turbulent flows are coupled within the present framework. Meanwhile, the implicit LES are also presented to verify the effect of LES models. A parallel implementation strategy for the present framework is developed, and three canonical wall-bounded turbulent flow cases are investigated, including the fully developed turbulent channel flow at a friction Reynolds number (Re) about 180, the turbulent plane Couette flow at a friction Re number about 93 and lid-driven cubical cavity flow at a Re number of 12000. The turbulence statistics, including mean velocity, the r.m.s. fluctuations velocity, Reynolds stress, etc. are computed by the present approach. Their predictions match precisely with each other, and they are both in reasonable agreement with the benchmark data of DNS. Especially, the predicted flow physics of three-dimensional lid-driven cavity flow are consistent with the description from abundant literature. The present numerical results verify that the present two-stage third-order DUGKS-based LES method is capable for simulating inhomogeneous wall-bounded turbulent flows and getting reliable results with relatively coarse grids.


Sign in / Sign up

Export Citation Format

Share Document