Adding GPU Acceleration to an Industrial CPU-Based Simulator, Development Strategy and Results

Mapping Intimacies ◽

10.2118/203936-ms ◽

2021 ◽

Author(s):

Hui Cao ◽

Rustem Zaydullin ◽

Terrence Liao ◽

Neil Gohaud ◽

Eguono Obi ◽

...

Keyword(s):

Development Strategy ◽

Future Trend ◽

Gpu Acceleration ◽

Real Field ◽

Development Effort ◽

Processing Unit ◽

Linear Solver ◽

Cell Simulation ◽

Speed Up ◽

Reservoir Simulator

Abstract Running multi-million cell simulation problems in minutes has been a dream for reservoir engineers for decades. Today, with the advancement of Graphic Processing Unit (GPU), we have a real chance to make this dream a reality. Here we present our experience in the step-by-step transformation of a fully developed industrial CPU-based simulator into a fully functional GPU-based simulator. We also demonstrate significant accelerations achieved through the use of GPU technology. To achieve the best performance possible, we choose to use CUDA (NVIDIA GPU’s native language), and offload as much computations to GPU as possible. Our CUDA implementation covers all reservoir computes, which include property calculation, linearization, linear solver, etc. The well and Field Management still reside on CPU and need minor changes for their interaction with GPU-based reservoir. Importantly, there is no change to the nonlinear logic. The GPU and CPU parts are overlapped, fully utilizing the asynchronous nature of GPU operations. Each reservoir computation can be run in three modes, CPU_only (existing one), GPU_only, CPU followed by GPU. The latter is only used for result checking and debugging. In early 2019, we prototyped two reservoir linearization operations (mass accumulation and mass flux) in CUDA; both showed very strong runtime speed-up of several hundred times, 1 P100-GPU (NVIDIA) vs 1 POWER8NVL CPU core rated at 2.8 GHz (IBM). Encouraged by this success, we moved into linear solver development and managed to move the entire linear solver module into GPU. Again, strong speed-up of ~50 times was achieved (1 GPU vs 1 CPU). The focus for 2019 has been on standard Black-Oil cases. Our implementation was tested with multiple "million-cell range" models (SPE10 and other real field cases). In early 2020, we managed to put SPE10 fully on GPU, and finished the entire 2000 day time-stepping in ~35 sec with a single P100 card. After that our effort has switched to compositional AIM (Adaptive Implicit Method), with focus on compositional flash and AIM implementation for reservoir linearization and linear solver, both show early promising results. GPU-based reservoir simulation is a future trend for HPC. The development of a reservoir simulator is complex, multi-discipline and time-consuming work. Our paper demonstrates a clear strategy to add tremendous GPU acceleration into an existing CPU-based simulator. Our approach fully utilizes the strength of the existing CPU simulator and minimizes the GPU development effort. This paper is also the first publication targeting GPU acceleration for compositional AIM models.

Download Full-text

Adaptation of the CPR Preconditioner for Efficient Solution of the Adjoint Equation

SPE Journal ◽

10.2118/141300-pa ◽

2013 ◽

Vol 18 (02) ◽

pp. 207-213 ◽

Cited By ~ 8

Author(s):

Choongyong Han ◽

John Wallis ◽

Pallav Sarma ◽

Gary Li ◽

Mark L. Schrader ◽

...

Keyword(s):

Efficient Solution ◽

Jacobian Matrix ◽

Adjoint Equation ◽

Optimization Techniques ◽

Production Optimization ◽

Real Field ◽

Matching Problem ◽

Linear Solver ◽

Reservoir Models ◽

Reservoir Simulator

Summary It is well known that the adjoint approach is the most efficient approach for gradient calculation, and it can be used with gradient-based optimization techniques to solve various optimization problems, such as the production-optimization problem and the history-matching problem. The adjoint equation to be solved in the approach is a linear equation formed with the “transpose” of the Jacobian matrix from a fully implicit reservoir simulator. For a large and/or complex reservoir model, generalized preconditioners often prove impractical for solving the adjoint equation. Preconditioners specialized for reservoir simulation, such as constrained pressure residual (CPR), exploit properties of the Jacobian matrix to accelerate convergence, so they cannot be applied directly to the adjoint equation. To overcome this challenge, we have developed a new two-stage preconditioner for efficient solution of the adjoint equation by adaptation of the CPR preconditioner (named CPRA: CPR preconditioner for adjoint equation). The CPRA preconditioner has been coupled with an algebraic multigrid (AMG) linear solver and implemented in Chevron's extended applications reservoir simulator (CHEARS®). The AMG solver is well known for its outstanding capability to solve the pressure equation of complex reservoir models; solving the linear system with the “transpose” of the pressure matrix is one of the two stages of construction of the CPRA preconditioner. Through test cases, we have confirmed that the CPRA/AMG solver with generalized minimal residual (GMRES) acceleration solves the adjoint equation very efficiently with a reasonable number of linear-solver iterations. Adjoint simulations to calculate the gradients with the CPRA/AMG solver take approximately the same amount of time (at most) as do the corresponding CPR/AMG forward simulations. Accuracy of the solutions has also been confirmed by verifying the gradients against solutions with a direct solver. A production-optimization case study for a real field using the CPRA/AMG solver has further validated its accuracy, efficiency, and the capability to perform long-term optimization for large, complex reservoir models at low computational cost.

Download Full-text

Strategi Pengembangan Usaha Sepuluh Wirausaha Muda Tenant Program IbK STIE Ahmad Dahlan Jakarta

Liquidity ◽

10.32546/lq.v1i2.142 ◽

2018 ◽

Vol 1 (2) ◽

pp. 125-134

Author(s):

Asriyal Asriyal ◽

Sutia Budi

Keyword(s):

Steady State ◽

Development Strategy ◽

Business Development ◽

Self Determination ◽

Strategy Development ◽

Development Effort ◽

Modern Business ◽

Young Entrepreneurs

The purpose of this study is to: (1) review and analyze the strategies that have been implemented by 10 young entrepreneur’s Program of IbK of STIEAD Jakarta in developing their businesses over the years; (2) identify and analyze the strategies that will be run by them for the next day; (3) analyze and formulate proposals for business development strategy is relevant to young entrepreneurs run by them. The results shows, the strategy which conducted by them is actually still conventional and little is applying modern business patterns. However, they have a plan/strategy development effort that started steady state. Targets that have been set should be reassessed and to be rationalized, if the strategy is capable of being implemented. The recommendations concerned are for all tenants should have self determination for entrepreneurship, able to instill confidence, and always looking for a way out in case of a deadlock

Download Full-text

Finite element method completely implemented for graphic processor units using parallel algorithm libraries

The International Journal of High Performance Computing Applications ◽

10.1177/1094342017694703 ◽

2017 ◽

Vol 33 (1) ◽

pp. 53-66 ◽

Cited By ~ 1

Author(s):

Franz Pichler ◽

Gundolf Haase

Keyword(s):

Finite Element ◽

Graphics Processing Unit ◽

Computational Cost ◽

Processing Unit ◽

Time Step ◽

Device Architecture ◽

Transient Problems ◽

Speed Up ◽

Automotive Batteries ◽

Graphics Processing

A finite element code is developed in which all of the computationally expensive steps are performed on a graphics processing unit via the THRUST and the PARALUTION libraries. The code focuses on the simulation of transient problems where the repeated computations per time-step create the computational cost. It is used to solve partial and ordinary differential equations as they arise in thermal-runaway simulations of automotive batteries. The speed-up obtained by utilizing the graphics processing unit for every critical step is compared against the single core and the multi-threading solutions which are also supported by the chosen libraries. This way a high total speed-up on the graphics processing unit is achieved without the need for programming a single classical Compute Unified Device Architecture kernel.

Download Full-text

Implementation of a Semi-Implicit Pressure-Based Multigrid Fluid Flow Algorithm on a Graphics Processing Unit

Volume 13: New Developments in Simulation Methods and Software for Engineering Applications; Safety Engineering, Risk Analysis and Reliability Methods; Transportation Systems ◽

10.1115/imece2009-11587 ◽

2009 ◽

Cited By ~ 5

Author(s):

Aaron F. Shinn ◽

S. P. Vanka

Keyword(s):

Stokes Equations ◽

Graphics Processing Unit ◽

Navier Stokes ◽

Processing Unit ◽

Navier Stokes Equations ◽

Driven Cavity ◽

Multigrid Algorithm ◽

Computational Speed ◽

Speed Up ◽

Graphics Processing

A semi-implicit pressure based multigrid algorithm for solving the incompressible Navier-Stokes equations was implemented on a Graphics Processing Unit (GPU) using CUDA (Compute Unified Device Architecture). The multigrid method employed was the Full Approximation Scheme (FAS), which is used for solving nonlinear equations. This algorithm is applied to the 2D driven cavity problem and compared to the CPU version of the code (written in Fortran) to assess computational speed-up.

Download Full-text

Graphical processing unit (GPU) acceleration for numerical solution of population balance models using high resolution finite volume algorithm

Computers & Chemical Engineering ◽

10.1016/j.compchemeng.2016.03.023 ◽

2016 ◽

Vol 91 ◽

pp. 167-181 ◽

Cited By ~ 25

Author(s):

Botond Szilágyi ◽

Zoltán K. Nagy

Keyword(s):

High Resolution ◽

Numerical Solution ◽

Finite Volume ◽

Population Balance ◽

Graphical Processing Unit ◽

Gpu Acceleration ◽

Processing Unit ◽

Graphical Processing ◽

Volume Algorithm

Download Full-text

A Parallel Reservoir Simulator for Large-Scale Reservoir Simulation

SPE Reservoir Evaluation & Engineering ◽

10.2118/75805-pa ◽

2002 ◽

Vol 5 (01) ◽

pp. 11-23 ◽

Cited By ~ 24

Author(s):

A.H. Dogru ◽

H.A. Sunaidi ◽

L.S. Fung ◽

W.A. Habiballah ◽

N. Al-Zamel ◽

...

Keyword(s):

Reservoir Simulation ◽

Distributed Memory ◽

Oil And Gas ◽

Real Field ◽

Post Processing ◽

Gas Reservoirs ◽

History Match ◽

Distributed Memory Machines ◽

Black Oil ◽

Reservoir Simulator

Summary A new parallel, black-oil-production reservoir simulator (Powers**) has been developed and fully integrated into the pre- and post-processing graphical environment. Its primary use is to simulate the giant oil and gas reservoirs of the Middle East using millions of cells. The new simulator has been created for parallelism and scalability, with the aim of making megacell simulation a day-to-day reservoir-management tool. Upon its completion, the parallel simulator was validated against published benchmark problems and other industrial simulators. Several giant oil-reservoir studies have been conducted with million-cell descriptions. This paper presents the model formulation, parallel linear solver, parallel locally refined grids, and parallel well management. The benefits of using megacell simulation models are illustrated by a real field example used to confirm bypassed oil zones and obtain a history match in a short time period. With the new technology, preprocessing, construction, running, and post-processing of megacell models is finally practical. A typical history- match run for a field with 30 to 50 years of production takes only a few hours. Introduction With the development of early parallel computers, the attractive speed of these computers got the attention of oil industry researchers. Initial questions were concentrated along these lines:Can one develop a truly parallel reservoir-simulator code?What type of hardware and programming languages should be chosen? Contrary to seismic, it is well known that reservoir simulator algorithms are not naturally parallel; they are more recursive, and variables display a strong dependency on each other (strong coupling and nonlinearity). This poses a big challenge for the parallelization. On the other hand, if one could develop a parallel code, the speed of computations would increase by at least an order of magnitude; as a result, many large problems could be handled. This capability would also aid our understanding of the fluid flow in a complex reservoir. Additionally, the proper handling of the reservoir heterogeneities should result in more realistic predictions. The other benefit of megacell description is the minimization of upscaling effects and numerical dispersion. The megacell simulation has a natural application in simulating the world's giant oil and gas reservoirs. For example, a grid size of 50 m or less is used widely for the small and medium-size reservoirs in the world. In contrast, many giant reservoirs in the Middle East use a gridblock size of 250 m or larger; this easily yields a model with more than 1 million cells. Therefore, it is of specific interest to have megacell description and still be able to run fast. Such capability is important for the day-to-day reservoir management of these fields. This paper is organized as follows: the relevant work in the petroleum-reservoir-simulation literature has been reviewed. This will be followed by the description of the new parallel simulator and the presentation of the numerical solution and parallelism strategies. (The details of the data structures, well handling, and parallel input/output operations are placed in the appendices). The main text also contains a brief description of the parallel linear solver, locally refined grids, and well management. A brief description of megacell pre- and post-processing is presented. Next, we address performance and parallel scalability; this is a key section that demonstrates the degree of parallelization of the simulator. The last section presents four real field simulation examples. These example cases cover all stages of the simulator and provide actual central processing unit (CPU) execution time for each case. As a byproduct, the benefits of megacell simulation are demonstrated by two examples: locating bypassed oil zones, and obtaining a quicker history match. Details of each section can be found in the appendices. Previous Work In the 1980s, research on parallel-reservoir simulation had been intensified by the further development of shared-memory and distributed- memory machines. In 1987, Scott et al.1 presented a Multiple Instruction Multiple Data (MIMD) approach to reservoir simulation. Chien2 investigated parallel processing on sharedmemory computers. In early 1990, Li3 presented a parallelized version of a commercial simulator on a shared-memory Cray computer. For the distributed-memory machines, Wheeler4 developed a black-oil simulator on a hypercube in 1989. In the early 1990s, Killough and Bhogeswara5 presented a compositional simulator on an Intel iPSC/860, and Rutledge et al.6 developed an Implicit Pressure Explicit Saturation (IMPES) black-oil reservoir simulator for the CM-2 machine. They showed that reservoir models over 2 million cells could be run on this type of machine with 65,536 processors. This paper stated that computational speeds in the order of 1 gigaflop in the matrix construction and solution were achievable. In mid-1995, more investigators published reservoir-simulation papers that focused on distributed-memory machines. Kaarstad7 presented a 2D oil/water research simulator running on a 16384 processor MasPar MP-2 machine. He showed that a model problem using 1 million gridpoints could be solved in a few minutes of computer time. Rame and Delshad8 parallelized a chemical flooding code (UTCHEM) and tested it on a variety of systems for scalability. This paper also included test results on Intel iPSC/960, CM-5, Kendall Square, and Cray T3D.

Download Full-text

Economic Evaluation of Y Gas Field Development Using Reservoir Simulator

Journal of Earth Energy Science, Engineering, and Technology ◽

10.25105/jeeset.v1i3.4685 ◽

2018 ◽

Vol 1 (3) ◽

Author(s):

Anita Theresa Panjaitan ◽

Rachmat Sudibjo ◽

Sri Fenny

Keyword(s):

Development Strategy ◽

Reservoir Simulation ◽

Gas Field ◽

Field Development ◽

Development Scenarios ◽

Black Oil ◽

Simulation Results ◽

Reservoir Simulation Model ◽

Attractive Case ◽

Reservoir Simulator

<p>Y Field which located around 28 km south east of Jakarta was discovered in 1989. Three wells have been drilled and suspended. The initial gas ini place (IGIP) of the field is 40.53 BSCF. The field will be developed in 2011. In this study, reservoir simulation model was made to predict the optimum development strategy of the field. This model consisted of 1,575,064 grid cells which were built in a black oil simulator. Two field development scenarios were defined with and without compressor. Simulation results show that the Recovery Factor at thel end of the contract is 61.40% and 62.14% respectively for Scenarios I and II without compressor. When compressor is applied then Recovey Factor of Scenarios I and II is 68.78% and 74.58%, correspondingly. Based on the economic parameters, Scenario II with compressor is the most <br />attractive case, where IRR, POT, and NPV of the scenario are 41%, 2.9 years, and 14,808 MUS$.</p>

Download Full-text

Phase Retrieval Holography for Particle Measurement With GPU Acceleration

Volume 4: Fluid Measurement and Instrumentation; Micro and Nano Fluid Dynamics ◽

10.1115/ajkfluids2019-5204 ◽

2019 ◽

Author(s):

Yohsuke Tanaka ◽

Hiroki Matsushi ◽

Shigeru Murata

Keyword(s):

Operating System ◽

Execution Time ◽

Phase Retrieval ◽

Small Volume ◽

Graphics Processing Unit ◽

Gpu Acceleration ◽

Processing Unit ◽

Drastic Reduction ◽

Particle Measurement ◽

Graphics Processing

Abstract We introduce a graphics processing unit (GPU) acceleration to reconstructing holograms of phase retrieval holography for a drastic reduction of the execution time. We conducted GPU acceleration using the FFT library CUFFT on the GPU chip (GEFORCE GTX 1050, GDDR5 2GB, NVIDIA). We also used Intel Xeon CPU (E5-2690, 2.90GHz, Intel), the memory of 24 GB, and the operating system of Ubuntu 16.04 to compare GPU and CPU. Reconstructed volumes changed from 2562 × 128 voxels to 20482 × 1024 voxel to compare execution times. The ratio of the time of GPU to that of CPUs is constantly higher than 100 times except for small volume. We also demonstrated that GPU acceleration decreased the time by observing falling particles, recorded in 40 frames, from particle feeder. As a result, it is found that the execution time is reduced from 13 hours to 30 minutes.

Download Full-text

Parallel computations of the step response of a floor heater with the use of a graphics processing unit. Part 2: results and their evaluation

Bulletin of the Polish Academy of Sciences Technical Sciences ◽

10.2478/bpasts-2013-0102 ◽

2013 ◽

Vol 61 (4) ◽

pp. 949-954 ◽

Cited By ~ 1

Author(s):

J. Gołębiowski ◽

J. Forenc

Keyword(s):

Graphics Processing Unit ◽

Sparse Matrix ◽

Temporal Distribution ◽

Step Response ◽

Processing Unit ◽

Commercial Program ◽

Speed Up ◽

Spatio Temporal ◽

Graphics Processing ◽

Linear Systems Of Equations

Abstract Using models and algorithms presented in the first part of the article, a spatio-temporal distribution of the step response of a floor heater was determined. The results have been presented in the form of heating curves and temperature profiles of the heater in the selected time moments. The computations results were verified through comparing them with the solution obtained with the use of a commercial program - NISA. Additionally, the distribution of the average time constant of thermal processes occurring in the heater was determined. The analysis of the use of a graphics processing unit in numerical computations based on the conjugate gradient method was done. It was proved that the use of a graphics processing unit is profitable in the case of solving linear systems of equations with dense coefficient matrices. In the case of a sparse matrix, the speed-up depends on the number of its non-zero elements.

Download Full-text

Ultrasonic pulse propagation simulation using OpenCL for environment mapping and discovery

The International Journal of High Performance Computing Applications ◽

10.1177/1094342019846290 ◽

2019 ◽

Vol 33 (5) ◽

pp. 1019-1029

Author(s):

Mohammad Y Al-Shorman ◽

Majd M Al-Kofahi

Keyword(s):

Experimental Data ◽

Pulse Propagation ◽

Graphics Processing Unit ◽

Ultrasonic Pulse ◽

Processing Unit ◽

Time Profiles ◽

Simulation Process ◽

Front End ◽

Speed Up ◽

Graphics Processing

A fast, highly parallelized, simulation of unidirectional ultrasonic pulse propagating in a two-dimensional environment is presented. The pulse intensity versus time is recorded using an array of unidirectional ultrasonic receivers located at known locations and arranged in a small circle around the transmitter. To speed up the simulation process, OpenCL 2.0 heterogeneous compute language on a graphics processing unit is used. The simulation result is then compared with experimental data to validate its accuracy. By comparing both simulated and experimental data, the collected intensity–time profiles can be used to map an environment. Environments can be mapped using not only direct reflections but also higher order reflections from objects that are not directly seen by the transmitter. With the help of this simulation, subtle characteristics in an environment, such as a slight tilt or curvature, can be measured. The front end of the simulation is written using C#, while the back end is written using C\C++ and OpenCL.

Download Full-text