Using GPU-Based Computing to Solve Large Sparse Systems of Linear Equations

Volume 2: 31st Computers and Information in Engineering Conference, Parts A and B ◽

10.1115/detc2011-48452 ◽

2011 ◽

Author(s):

Travis J. Carrigan ◽

Jacob Watt ◽

Brian H. Dennis

Keyword(s):

Finite Element ◽

Domain Decomposition ◽

Graphics Processing Units ◽

Sparse Matrix ◽

Low Cost ◽

Linear Equations ◽

Parallel Architecture ◽

General Purpose ◽

Matrix Vector Multiplication ◽

Matrix Vector

Often thought of as tools for image rendering or data visualization, graphics processing units (GPU) are becoming increasingly popular in the areas of scientific computing due to their low cost massively parallel architecture. With the introduction of CUDA C by NVIDIA and CUDA enabled GPUs, the ability to perform general purpose computations without the need to utilize shading languages is now possible. One such application that benefits from the capabilities provided by NVIDIA hardware is computational continuum mechanics (CCM). The need to solve sparse linear systems of equations is common in CCM when partial differential equations are discretized. Often these systems are solved iteratively using domain decomposition among distributed processors working in parallel. In this paper we explore the benefits of using GPUs to improve the performance of sparse matrix operations, more specifically, sparse matrix-vector multiplication. Our approach does not require domain decomposition, so it is simpler than corresponding implementation for distributed memory parallel computers. We demonstrate that for matrices produced from finite element discretizations on unstructured meshes, the performance of the matrix-vector multiplication operation is just under 13 times faster than when run serially on an Intel i5 system. Furthermore, we show that when used in conjunction with the biconjugate gradient stabilized method (BiCGSTAB), a gradient based iterative linear solver, the method is over 13 times faster than the serially executed C equivalent. And lastly, we emphasize the application of such method for solving Poisson’s equation using the Galerkin finite element method, and demonstrate over 10.5 times higher performance on the GPU when compared with the Intel i5 system.

Download Full-text

GPU-Friendly Preconditioners for Efficient 3-D Finite Element Analysis of Thin Structures

Volume 2: 31st Computers and Information in Engineering Conference, Parts A and B ◽

10.1115/detc2011-47330 ◽

2011 ◽

Cited By ~ 1

Author(s):

Vikalp Mishra ◽

Krishnan Suresh

Keyword(s):

Finite Element Analysis ◽

Finite Element ◽

Sparse Matrix ◽

Grid Method ◽

Double Precision ◽

Thin Structures ◽

Element Analysis ◽

Dual Representation ◽

Matrix Vector Multiplication ◽

Matrix Vector

A serious computational bottle-neck in finite element analysis today is the solution of the underlying system of equations. To alleviate this problem, researchers have proposed the use of graphics programmable units (GPU) for fast iterative solution of such equations. Indeed, researchers have shown that a GPU-implementation of a double-precision sparse-matrix-vector multiplication (that underlies all iterative methods) is approximately an order of magnitude faster than that of an optimized CPU implementation. Unfortunately, fast matrix-vector multiplication alone is insufficient… a good preconditioner is necessary for rapid convergence. Furthermore, most modern preconditioners, such as incomplete Cholesky, are expensive to compute, and cannot be easily ported to the GPU. In this paper, we propose a special class of preconditioners for the analysis of thin structures, such as beams and plates. The proposed preconditioners are developed by combining the multi-grid method, with recently developed dual-representation method for thin structures. It is shown, that these preconditioners are computationally inexpensive, perform better than standard pre-conditioners, and can be easily ported to the GPU.

Download Full-text

Comparison of GPU-Based Parallel Assembly and Assembly-Free Sparse Matrix Vector Multiplication for Finite Element Analysis of Three-Dimensional Structures

Proceedings of the Fifteenth International Conference on Civil, Structural and Environmental Engineering Computing ◽

10.4203/ccp.108.222 ◽

2015 ◽

Cited By ~ 1

Author(s):

A. Akbariyeh ◽

B.H. Dennis ◽

B.P. Wang ◽

K.L. Lawrence

Keyword(s):

Finite Element Analysis ◽

Finite Element ◽

Sparse Matrix ◽

Three Dimensional ◽

Element Analysis ◽

Matrix Vector Multiplication ◽

Parallel Assembly ◽

Matrix Vector

Download Full-text

An efficient sparse matrix-vector multiplication on CUDA-enabled graphic processing units for finite element method simulations

International Journal for Numerical Methods in Engineering ◽

10.1002/nme.5346 ◽

2016 ◽

Vol 110 (1) ◽

pp. 57-78 ◽

Cited By ~ 2

Author(s):

Atakan Altinkaynak

Keyword(s):

Finite Element Method ◽

Finite Element ◽

Sparse Matrix ◽

Graphic Processing Units ◽

Matrix Vector Multiplication ◽

Matrix Vector ◽

Element Method ◽

Graphic Processing

Download Full-text

A Novel Multi-GPU Parallel Optimization Model for The Sparse Matrix-Vector Multiplication

Parallel Processing Letters ◽

10.1142/s0129626416400016 ◽

2016 ◽

Vol 26 (04) ◽

pp. 1640001

Author(s):

Jiaquan Gao ◽

Yuanshen Zhou ◽

Kesong Wu

Keyword(s):

Optimization Model ◽

Graphics Processing Units ◽

High Efficiency ◽

Sparse Matrix ◽

Performance Model ◽

Parallel Optimization ◽

Multiple Gpus ◽

Matrix Vector Multiplication ◽

Storage Format ◽

Matrix Vector

Accelerating the sparse matrix-vector multiplication (SpMV) on the graphics processing units (GPUs) has attracted considerable attention recently. We observe that on a specific multiple-GPU platform, the SpMV performance can usually be greatly improved when a matrix is partitioned into several blocks according to a predetermined rule and each block is assigned to a GPU with an appropriate storage format. This motivates us to propose a novel multi-GPU parallel SpMV optimization model. Our model involves two stages. In the first stage, a simple rule is defined to divide any given matrix among multiple GPUs, and then a performance model, which is independent of the problems and dependent on the resources of devices, is proposed to accurately predict the execution time of SpMV kernels. Using these models, we construct in the second stage an optimally multi-GPU parallel SpMV algorithm that is automatically and rapidly generated for the platform for any problem. Given that our model for SpMV is general, independent of the problems, and dependent on the resources of devices, this model is constructed only once for each type of GPU. The experiments validate the high efficiency of our proposed model.

Download Full-text

A new sparse matrix vector multiplication graphics processing unit algorithm designed for finite element problems

International Journal for Numerical Methods in Engineering ◽

10.1002/nme.4865 ◽

2015 ◽

Vol 102 (12) ◽

pp. 1784-1814 ◽

Cited By ~ 15

Author(s):

J. Wong ◽

E. Kuhl ◽

E. Darve

Keyword(s):

Finite Element ◽

Graphics Processing Unit ◽

Sparse Matrix ◽

Processing Unit ◽

Matrix Vector Multiplication ◽

Graphics Processing ◽

Matrix Vector

Download Full-text

PERFORMANCE EVALUATION OF DOMAIN DECOMPOSITION METHOD WITH SPARSE MATRIX STORAGE SCHEMES IN MODERN SUPERCOMPUTER

International Journal of Computational Methods ◽

10.1142/s0219876213440076 ◽

2014 ◽

Vol 11 (supp01) ◽

pp. 1344007 ◽

Cited By ~ 1

Author(s):

ABUL MUKID MOHAMMAD MUKADDES ◽

MASAO OGINO ◽

RYUJI SHIOYA

Keyword(s):

Performance Evaluation ◽

Domain Decomposition ◽

Data Structures ◽

Decomposition Method ◽

Sparse Matrix ◽

Domain Decomposition Method ◽

Storage Efficiency ◽

Matrix Vector Multiplication ◽

Performance Results ◽

Matrix Vector

The use of proper data structures with corresponding algorithms is critical to achieve good performance in scientific computing. The need of sparse matrix vector multiplication in each iteration of the iterative domain decomposition method has led to implementation of a variety of sparse matrix storage formats. Many storage formats have been presented to represent sparse matrix and integrated in the method. In this paper, the storage efficiency of those sparse matrix storage formats are evaluated and compared. The performance results of sparse matrix vector multiplication used in the domain decomposition method is considered. Based on our experiments in the FX10 supercomputer system, some useful conclusions that can serve as guidelines for the optimization of domain decomposition method are extracted.

Download Full-text

Fast sparse matrix-vector multiplication on graphics processing unit for finite element analysis

2012 IEEE 14th International Conference on High Performance Computing and Communication & 2012 IEEE 9th International Conference on Embedded Software and Systems ◽

10.1109/hpcc.2012.193 ◽

2012 ◽

Cited By ~ 11

Author(s):

Abal-Kassim Cheik Ahamed ◽

Frederic Magoules

Keyword(s):

Finite Element Analysis ◽

Finite Element ◽

Graphics Processing Unit ◽

Sparse Matrix ◽

Processing Unit ◽

Element Analysis ◽

Matrix Vector Multiplication ◽

Graphics Processing ◽

Matrix Vector

Download Full-text

CUDA GPU libraries and novel sparse matrix-vector multiplication - implementation and performance enhancement in unstructured finite element computations

International Journal of Computational Science and Engineering ◽

10.1504/ijcse.2019.104436 ◽

2019 ◽

Vol 20 (4) ◽

pp. 501

Author(s):

Richard Haney ◽

Ram Mohan

Keyword(s):

Finite Element ◽

Performance Enhancement ◽

Sparse Matrix ◽

Matrix Vector Multiplication ◽

And Performance ◽

Matrix Vector

Download Full-text

Sparse Matrix-Vector Multiplication for Finite Element Method Matrices on FPGAs

2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines ◽

10.1109/fccm.2006.65 ◽

2006 ◽

Cited By ~ 13

Author(s):

Yousef El-Kurdi ◽

Warren Gross ◽

Dennis Giannacopoulos

Keyword(s):

Finite Element Method ◽

Finite Element ◽

Sparse Matrix ◽

Matrix Vector Multiplication ◽

Matrix Vector ◽

Element Method

Download Full-text

FPGA architecture and implementation of sparse matrix–vector multiplication for the finite element method

Computer Physics Communications ◽

10.1016/j.cpc.2007.11.014 ◽

2008 ◽

Vol 178 (8) ◽

pp. 558-570 ◽

Cited By ~ 16

Author(s):

Yousef Elkurdi ◽

David Fernández ◽

Evgueni Souleimanov ◽

Dennis Giannacopoulos ◽

Warren J. Gross

Keyword(s):

Finite Element Method ◽

Finite Element ◽

Sparse Matrix ◽

The Finite Element Method ◽

Matrix Vector Multiplication ◽

Fpga Architecture ◽

Matrix Vector ◽

Element Method

Download Full-text