An Accelerated 3D Navier–Stokes Solver for Flows in Turbomachines

Tobias Brandvik; Graham Pullan

doi:10.1115/1.4001192

An Accelerated 3D Navier–Stokes Solver for Flows in Turbomachines

Journal of Turbomachinery ◽

10.1115/1.4001192 ◽

2010 ◽

Vol 133 (2) ◽

Cited By ~ 43

Author(s):

Tobias Brandvik ◽

Graham Pullan

Keyword(s):

Graphics Processing Units ◽

Three Dimensional ◽

Navier Stokes ◽

Linear Scaling ◽

Test Case ◽

Processing Unit ◽

Central Processing ◽

Order Of Magnitude ◽

Graphics Processing ◽

Good Agreement

A new three-dimensional Navier–Stokes solver for flows in turbomachines has been developed. The new solver is based on the latest version of the Denton codes but has been implemented to run on graphics processing units (GPUs) instead of the traditional central processing unit. The change in processor enables an order-of-magnitude reduction in run-time due to the higher performance of the GPU. The scaling results for a 16 node GPU cluster are also presented, showing almost linear scaling for typical turbomachinery cases. For validation purposes, a test case consisting of a three-stage turbine with complete hub and casing leakage paths is described. Good agreement is obtained with previously published experimental results. The simulation runs in less than 10 min on a cluster with four GPUs.

Download Full-text

An Accelerated 3D Navier-Stokes Solver for Flows in Turbomachines

Volume 7: Turbomachinery, Parts A and B ◽

10.1115/gt2009-60052 ◽

2009 ◽

Cited By ~ 20

Author(s):

Tobias Brandvik ◽

Graham Pullan

Keyword(s):

Graphics Processing Units ◽

Three Dimensional ◽

Navier Stokes ◽

Linear Scaling ◽

Test Case ◽

Processing Unit ◽

Central Processing ◽

Order Of Magnitude ◽

Graphics Processing ◽

Good Agreement

A new three-dimensional Navier-Stokes solver for flows in turbomachines has been developed. The new solver is based on the latest version of the Denton codes, but has been implemented to run on Graphics Processing Units (GPUs) instead of the traditional Central Processing Unit (CPU). The change in processor enables an order-of-magnitude reduction in run-time due to the higher performance of the GPU. Scaling results for a 16 node GPU cluster are also presented, showing almost linear scaling for typical turbomachinery cases. For validation purposes, a test case consisting of a three-stage turbine with complete hub and casing leakage paths is described. Good agreement is obtained with previously published experimental results. The simulation runs in less than 10 minutes on a cluster with four GPUs.

Download Full-text

Implementation of an Edge-Based Navier-Stokes Solver for Unstructured Grids in Graphics Processing Units

Volume 7: Turbomachinery, Parts A, B, and C ◽

10.1115/gt2011-46224 ◽

2011 ◽

Cited By ~ 2

Author(s):

Fernando Gisbert ◽

Roque Corral ◽

Guillermo Pastor

Keyword(s):

Graphics Processing Units ◽

Unstructured Grids ◽

Three Dimensional ◽

Navier Stokes ◽

Computational Time ◽

Central Processing ◽

Order Of Magnitude ◽

Edge Based ◽

Graphics Processing ◽

Gpu Implementation

The implementation of an edge-based three-dimensional RANS equations solver for unstructured grids that runs on both central processing units (CPUs) and graphics processing units (GPUs) is presented. This CPU/GPU duality is kept without double-writing the code, reducing programming and maintenance costs. The GPU implementation is based on the standard OpenCL language. The code has been parallelized using MPI. Some turbomachinery benchmark cases are presented. For all cases, an order of magnitude reduction in computational time is achieved when the code is executed on GPUs instead of CPUs.

Download Full-text

High-performance computing in water resources hydrodynamics

Journal of Hydroinformatics ◽

10.2166/hydro.2020.163 ◽

2020 ◽

Vol 22 (5) ◽

pp. 1217-1235 ◽

Cited By ~ 3

Author(s):

M. Morales-Hernández ◽

M. B. Sharif ◽

S. Gangrade ◽

T. T. Dullo ◽

S.-C. Kao ◽

...

Keyword(s):

Water Resources ◽

High Performance Computing ◽

Graphics Processing Units ◽

High Performance ◽

Large Scale ◽

Test Case ◽

Processing Unit ◽

Central Processing ◽

Graphics Processing ◽

Performance Computing

Abstract This work presents a vision of future water resources hydrodynamics codes that can fully utilize the strengths of modern high-performance computing (HPC). The advances to computing power, formerly driven by the improvement of central processing unit processors, now focus on parallel computing and, in particular, the use of graphics processing units (GPUs). However, this shift to a parallel framework requires refactoring the code to make efficient use of the data as well as changing even the nature of the algorithm that solves the system of equations. These concepts along with other features such as the precision for the computations, dry regions management, and input/output data are analyzed in this paper. A 2D multi-GPU flood code applied to a large-scale test case is used to corroborate our statements and ascertain the new challenges for the next-generation parallel water resources codes.

Download Full-text

Numerical simulation of flattened heat pipe with double heat sources for CPU and GPU cooling application in laptop computers

Journal of Computational Design and Engineering ◽

10.1093/jcde/qwaa091 ◽

2020 ◽

Author(s):

Wisoot Sanhan ◽

Kambiz Vafai ◽

Niti Kammuang-Lue ◽

Pradit Terdtoon ◽

Phrut Sakulchangsatjatai

Keyword(s):

Experimental Data ◽

Heat Pipe ◽

Graphics Processing Unit ◽

Processing Unit ◽

Heat Sources ◽

Final Thickness ◽

Laptop Computers ◽

Central Processing ◽

Graphics Processing ◽

Good Agreement

Abstract An investigation of the effect of the thermal performance of the flattened heat pipe on its double heat sources acting as central processing unit and graphics processing unit in laptop computers is presented in this work. A finite element method is used for predicting the flattening effect of the heat pipe. The cylindrical heat pipe with a diameter of 6 mm and the total length of 200 mm is flattened into three final thicknesses of 2, 3, and 4 mm. The heat pipe is placed under a horizontal configuration and heated with heater 1 and heater 2, 40 W in combination. The numerical model shows good agreement compared with the experimental data with the standard deviation of 1.85%. The results also show that flattening the cylindrical heat pipe to 66.7 and 41.7% of its original diameter could reduce its normalized thermal resistance by 5.2%. The optimized final thickness or the best design final thickness for the heat pipe is found to be 2.5 mm.

Download Full-text

PI-FLAME: A parallel immune system simulator using the FLAME graphic processing unit environment

SIMULATION ◽

10.1177/0037549716673724 ◽

2016 ◽

Vol 93 (1) ◽

pp. 69-84 ◽

Cited By ~ 6

Author(s):

Shailesh Tamrakar ◽

Paul Richmond ◽

Roshan M D’Souza

Keyword(s):

Immune System ◽

Graphics Processing Units ◽

Processing Unit ◽

Human Immune System ◽

Innate And Adaptive Immunity ◽

Agent Based ◽

Central Processing ◽

Agent Simulation ◽

Study Population ◽

Graphics Processing

Agent-based models (ABMs) are increasingly being used to study population dynamics in complex systems, such as the human immune system. Previously, Folcik et al. (The basic immune simulator: an agent-based model to study the interactions between innate and adaptive immunity. Theor Biol Med Model 2007; 4: 39) developed a Basic Immune Simulator (BIS) and implemented it using the Recursive Porous Agent Simulation Toolkit (RePast) ABM simulation framework. However, frameworks such as RePast are designed to execute serially on central processing units and therefore cannot efficiently handle large model sizes. In this paper, we report on our implementation of the BIS using FLAME GPU, a parallel computing ABM simulator designed to execute on graphics processing units. To benchmark our implementation, we simulate the response of the immune system to a viral infection of generic tissue cells. We compared our results with those obtained from the original RePast implementation for statistical accuracy. We observe that our implementation has a 13× performance advantage over the original RePast implementation.

Download Full-text

Controllers: An abstraction to ease the use of hardware accelerators

The International Journal of High Performance Computing Applications ◽

10.1177/1094342017702962 ◽

2017 ◽

Vol 32 (6) ◽

pp. 838-853 ◽

Cited By ~ 4

Author(s):

Ana Moreton–Fernandez ◽

Hector Ortega–Arranz ◽

Arturo Gonzalez–Escribano

Keyword(s):

Graphics Processing Units ◽

High Performance ◽

Abstract Entity ◽

Hardware Accelerators ◽

Processing Unit ◽

Central Processing ◽

Computing Platforms ◽

Graphics Processing ◽

Performance Computing ◽

Selection Of

Nowadays the use of hardware accelerators, such as the graphics processing units or XeonPhi coprocessors, is key in solving computationally costly problems that require high performance computing. However, programming solutions for an efficient deployment for these kind of devices is a very complex task that relies on the manual management of memory transfers and configuration parameters. The programmer has to carry out a deep study of the particular data that needs to be computed at each moment, across different computing platforms, also considering architectural details. We introduce the controller concept as an abstract entity that allows the programmer to easily manage the communications and kernel launching details on hardware accelerators in a transparent way. This model also provides the possibility of defining and launching central processing unit kernels in multi-core processors with the same abstraction and methodology used for the accelerators. It internally combines different native programming models and technologies to exploit the potential of each kind of device. Additionally, the model also allows the programmer to simplify the proper selection of values for several configuration parameters that can be selected when a kernel is launched. This is done through a qualitative characterization process of the kernel code to be executed. Finally, we present the implementation of the controller model in a prototype library, together with its application in several case studies. Its use has led to reductions in the development and porting costs, with significantly low overheads in the execution times when compared to manually programmed and optimized solutions which directly use CUDA and OpenMP.

Download Full-text

Graphics processing unit implementation of the F-statistic for continuous gravitational wave searches

Classical and Quantum Gravity ◽

10.1088/1361-6382/ac4616 ◽

2021 ◽

Author(s):

Liam Dunn ◽

Patrick Clearwater ◽

Andrew Melatos ◽

Karl Wette

Keyword(s):

Gravitational Wave ◽

Graphics Processing Units ◽

Graphics Processing Unit ◽

Computational Cost ◽

Processing Unit ◽

Central Processing ◽

Long Baseline ◽

Using Data ◽

Graphics Processing ◽

Gpu Implementation

Abstract The F-statistic is a detection statistic used widely in searches for continuous gravitational waves with terrestrial, long-baseline interferometers. A new implementation of the F-statistic is presented which accelerates the existing "resampling" algorithm using graphics processing units (GPUs). The new implementation runs between 10 and 100 times faster than the existing implementation on central processing units without sacrificing numerical accuracy. The utility of the GPU implementation is demonstrated on a pilot narrowband search for four newly discovered millisecond pulsars in the globular cluster Omega Centauri using data from the second Laser Interferometer Gravitational-Wave Observatory observing run. The computational cost is 17:2 GPU-hours using the new implementation, compared to 1092 core-hours with the existing implementation.

Download Full-text

Accelerating 3D Medical Image Segmentation by Adaptive Small-Scale Target Localization

Journal of Imaging ◽

10.3390/jimaging7020035 ◽

2021 ◽

Vol 7 (2) ◽

pp. 35

Author(s):

Boris Shirokikh ◽

Alexey Shevtsov ◽

Alexandra Dalechina ◽

Egor Krivov ◽

Valery Kostjuchenko ◽

...

Keyword(s):

Image Segmentation ◽

Graphics Processing Units ◽

Medical Image ◽

State Of The Art ◽

Three Dimensional ◽

Medical Image Segmentation ◽

Small Scale ◽

Processing Unit ◽

Central Processing ◽

Lung Cancer Diagnosis

The prevailing approach for three-dimensional (3D) medical image segmentation is to use convolutional networks. Recently, deep learning methods have achieved human-level performance in several important applied problems, such as volumetry for lung-cancer diagnosis or delineation for radiation therapy planning. However, state-of-the-art architectures, such as U-Net and DeepMedic, are computationally heavy and require workstations accelerated with graphics processing units for fast inference. However, scarce research has been conducted concerning enabling fast central processing unit computations for such networks. Our paper fills this gap. We propose a new segmentation method with a human-like technique to segment a 3D study. First, we analyze the image at a small scale to identify areas of interest and then process only relevant feature-map patches. Our method not only reduces the inference time from 10 min to 15 s but also preserves state-of-the-art segmentation quality, as we illustrate in the set of experiments with two large datasets.

Download Full-text

Accelerating the RTTOV-7 IASI and AMSU-A radiative transfer models on graphics processing units: evaluating central processing unit/graphics processing unit-hybrid and pure-graphics processing unit approaches

Journal of Applied Remote Sensing ◽

10.1117/1.3658028 ◽

2011 ◽

Vol 5 (1) ◽

pp. 051503 ◽

Cited By ~ 4

Author(s):

Jarno Mielikainen

Keyword(s):

Radiative Transfer ◽

Graphics Processing Units ◽

Graphics Processing Unit ◽

Central Processing Unit ◽

Processing Unit ◽

Central Processing ◽

Radiative Transfer Models ◽

Graphics Processing ◽

Transfer Models

Download Full-text

Ray-based modeling and imaging in viscoelastic media using graphics processing units

Geophysics ◽

10.1190/geo2018-0510.1 ◽

2019 ◽

Vol 84 (5) ◽

pp. S425-S436

Author(s):

Martin Sarajaervi ◽

Henk Keers

Keyword(s):

Seismic Data ◽

Graphics Processing Units ◽

Graphics Processing Unit ◽

Parallel Implementation ◽

Processing Unit ◽

Central Processing ◽

Imaging Results ◽

Viscoelastic Modeling ◽

Graphics Processing ◽

Complex Valued

In seismic data processing, the amplitude loss caused by attenuation should be taken into account. The basis for this is provided by a 3D attenuation model described by the quality factor [Formula: see text], which is used in viscoelastic modeling and imaging. We have accomplished viscoelastic modeling and imaging using ray theory and the ray-Born approximation. This makes it possible to take [Formula: see text] into account using complex-valued and frequency-dependent traveltimes. We have developed a unified parallel implementation for modeling and imaging in the frequency domain and carried out the numerical integration on a graphics processing unit. A central part of the implementation is an efficient technique for computing large integrals. We applied the integration method to the 3D SEG/EAGE overthrust model to generate synthetic seismograms and imaging results. The attenuation effects are accurately modeled in the seismograms and compensated for in the imaging algorithm. The results indicate a significant improvement in computational efficiency compared to a parallel central processing unit baseline.

Download Full-text