Using GPU-Based Computing to Solve Large Sparse Systems of Linear Equations

Author(s):  
Travis J. Carrigan ◽  
Jacob Watt ◽  
Brian H. Dennis

Often thought of as tools for image rendering or data visualization, graphics processing units (GPU) are becoming increasingly popular in the areas of scientific computing due to their low cost massively parallel architecture. With the introduction of CUDA C by NVIDIA and CUDA enabled GPUs, the ability to perform general purpose computations without the need to utilize shading languages is now possible. One such application that benefits from the capabilities provided by NVIDIA hardware is computational continuum mechanics (CCM). The need to solve sparse linear systems of equations is common in CCM when partial differential equations are discretized. Often these systems are solved iteratively using domain decomposition among distributed processors working in parallel. In this paper we explore the benefits of using GPUs to improve the performance of sparse matrix operations, more specifically, sparse matrix-vector multiplication. Our approach does not require domain decomposition, so it is simpler than corresponding implementation for distributed memory parallel computers. We demonstrate that for matrices produced from finite element discretizations on unstructured meshes, the performance of the matrix-vector multiplication operation is just under 13 times faster than when run serially on an Intel i5 system. Furthermore, we show that when used in conjunction with the biconjugate gradient stabilized method (BiCGSTAB), a gradient based iterative linear solver, the method is over 13 times faster than the serially executed C equivalent. And lastly, we emphasize the application of such method for solving Poisson’s equation using the Galerkin finite element method, and demonstrate over 10.5 times higher performance on the GPU when compared with the Intel i5 system.

Author(s):  
Vikalp Mishra ◽  
Krishnan Suresh

A serious computational bottle-neck in finite element analysis today is the solution of the underlying system of equations. To alleviate this problem, researchers have proposed the use of graphics programmable units (GPU) for fast iterative solution of such equations. Indeed, researchers have shown that a GPU-implementation of a double-precision sparse-matrix-vector multiplication (that underlies all iterative methods) is approximately an order of magnitude faster than that of an optimized CPU implementation. Unfortunately, fast matrix-vector multiplication alone is insufficient… a good preconditioner is necessary for rapid convergence. Furthermore, most modern preconditioners, such as incomplete Cholesky, are expensive to compute, and cannot be easily ported to the GPU. In this paper, we propose a special class of preconditioners for the analysis of thin structures, such as beams and plates. The proposed preconditioners are developed by combining the multi-grid method, with recently developed dual-representation method for thin structures. It is shown, that these preconditioners are computationally inexpensive, perform better than standard pre-conditioners, and can be easily ported to the GPU.


2016 ◽  
Vol 26 (04) ◽  
pp. 1640001
Author(s):  
Jiaquan Gao ◽  
Yuanshen Zhou ◽  
Kesong Wu

Accelerating the sparse matrix-vector multiplication (SpMV) on the graphics processing units (GPUs) has attracted considerable attention recently. We observe that on a specific multiple-GPU platform, the SpMV performance can usually be greatly improved when a matrix is partitioned into several blocks according to a predetermined rule and each block is assigned to a GPU with an appropriate storage format. This motivates us to propose a novel multi-GPU parallel SpMV optimization model. Our model involves two stages. In the first stage, a simple rule is defined to divide any given matrix among multiple GPUs, and then a performance model, which is independent of the problems and dependent on the resources of devices, is proposed to accurately predict the execution time of SpMV kernels. Using these models, we construct in the second stage an optimally multi-GPU parallel SpMV algorithm that is automatically and rapidly generated for the platform for any problem. Given that our model for SpMV is general, independent of the problems, and dependent on the resources of devices, this model is constructed only once for each type of GPU. The experiments validate the high efficiency of our proposed model.


2014 ◽  
Vol 11 (supp01) ◽  
pp. 1344007 ◽  
Author(s):  
ABUL MUKID MOHAMMAD MUKADDES ◽  
MASAO OGINO ◽  
RYUJI SHIOYA

The use of proper data structures with corresponding algorithms is critical to achieve good performance in scientific computing. The need of sparse matrix vector multiplication in each iteration of the iterative domain decomposition method has led to implementation of a variety of sparse matrix storage formats. Many storage formats have been presented to represent sparse matrix and integrated in the method. In this paper, the storage efficiency of those sparse matrix storage formats are evaluated and compared. The performance results of sparse matrix vector multiplication used in the domain decomposition method is considered. Based on our experiments in the FX10 supercomputer system, some useful conclusions that can serve as guidelines for the optimization of domain decomposition method are extracted.


2008 ◽  
Vol 178 (8) ◽  
pp. 558-570 ◽  
Author(s):  
Yousef Elkurdi ◽  
David Fernández ◽  
Evgueni Souleimanov ◽  
Dennis Giannacopoulos ◽  
Warren J. Gross

Sign in / Sign up

Export Citation Format

Share Document