GPU Implementation of Image Convolution Using Sparse Model with Efficient Storage Format

Saira Banu Jamal Mohammed; M. Rajasekhara Babu; Sumithra Sriram

doi:10.4018/ijghpc.2018010104

GPU Implementation of Image Convolution Using Sparse Model with Efficient Storage Format

International Journal of Grid and High Performance Computing ◽

10.4018/ijghpc.2018010104 ◽

2018 ◽

Vol 10 (1) ◽

pp. 54-70

Author(s):

Saira Banu Jamal Mohammed ◽

M. Rajasekhara Babu ◽

Sumithra Sriram

Keyword(s):

Edge Detection ◽

Gpu Computing ◽

Sparse Matrix ◽

Image Smoothing ◽

Matrix Vector Multiplication ◽

Research Fields ◽

Compressed Sparse Row ◽

Storage Format ◽

Csr Format ◽

Gpu Implementation

With the growth of data parallel computing, role of GPU computing in non-graphic applications such as image processing becomes a focus in research fields. Convolution is an integral operation in filtering, smoothing and edge detection. In this article, the process of convolution is realized as a sparse linear system and is solved using Sparse Matrix Vector Multiplication (SpMV). The Compressed Sparse Row (CSR) format of SPMV shows better CPU performance compared to normal convolution. To overcome the stalling of threads for short rows in the GPU implementation of CSR SpMV, a more efficient model is proposed, which uses the Adaptive-Compressed Row Storage (A-CSR) format to implement the same. Using CSR in the convolution process achieves a 1.45x and a 1.159x increase in speed compared to the normal convolution of image smoothing and edge detection operations, respectively. An average speedup of 2.05x is achieved for image smoothing technique and 1.58x for edge detection technique in GPU platform usig adaptive CSR format.

Download Full-text

Efficient Sparse Matrix-Vector Multiplication on GPUs using the CSR Format, Pinned Memory and Overlap Data Transfer

2019 IEEE XXVI International Conference on Electronics, Electrical Engineering and Computing (INTERCON) ◽

10.1109/intercon.2019.8853624 ◽

2019 ◽

Author(s):

Herwin Alayn Huillcen Baca ◽

Flor de Luz Palomino Valdivia

Keyword(s):

Data Transfer ◽

Sparse Matrix ◽

Matrix Vector Multiplication ◽

Matrix Vector ◽

Csr Format

Download Full-text

A Novel CSR-Based Sparse Matrix-Vector Multiplication on GPUs

Mathematical Problems in Engineering ◽

10.1155/2016/8471283 ◽

2016 ◽

Vol 2016 ◽

pp. 1-12 ◽

Cited By ~ 3

Author(s):

Guixia He ◽

Jiaquan Gao

Keyword(s):

Sparse Matrix ◽

Sparse Matrices ◽

Poor Performance ◽

Test Results ◽

Graphic Processing Units ◽

Multiple Gpus ◽

Matrix Vector Multiplication ◽

Compressed Sparse Row ◽

Access Patterns ◽

Matrix Vector

Sparse matrix-vector multiplication (SpMV) is an important operation in scientific computations. Compressed sparse row (CSR) is the most frequently used format to store sparse matrices. However, CSR-based SpMVs on graphic processing units (GPUs), for example, CSR-scalar and CSR-vector, usually have poor performance due to irregular memory access patterns. This motivates us to propose a perfect CSR-based SpMV on the GPU that is called PCSR. PCSR involves two kernels and accesses CSR arrays in a fully coalesced manner by introducing a middle array, which greatly alleviates the deficiencies of CSR-scalar (rare coalescing) and CSR-vector (partial coalescing). Test results on a single C2050 GPU show that PCSR fully outperforms CSR-scalar, CSR-vector, and CSRMV and HYBMV in the vendor-tuned CUSPARSE library and is comparable with a most recently proposed CSR-based algorithm, CSR-Adaptive. Furthermore, we extend PCSR on a single GPU to multiple GPUs. Experimental results on four C2050 GPUs show that no matter whether the communication between GPUs is considered or not PCSR on multiple GPUs achieves good performance and has high parallel efficiency.

Download Full-text

Efficient Sparse Matrix-Vector Multiplication on GPUs Using the CSR Storage Format

SC14: International Conference for High Performance Computing, Networking, Storage and Analysis ◽

10.1109/sc.2014.68 ◽

2014 ◽

Cited By ~ 76

Author(s):

Joseph L. Greathouse ◽

Mayank Daga

Keyword(s):

Sparse Matrix ◽

Matrix Vector Multiplication ◽

Storage Format ◽

Matrix Vector

Download Full-text

Efficient sparse matrix–vector multiplication using cache oblivious extension quadtree storage format

Future Generation Computer Systems ◽

10.1016/j.future.2015.03.005 ◽

2016 ◽

Vol 54 ◽

pp. 490-500 ◽

Cited By ~ 9

Author(s):

Jilin Zhang ◽

Jian Wan ◽

Fangfang Li ◽

Jie Mao ◽

Li Zhuang ◽

...

Keyword(s):

Sparse Matrix ◽

Matrix Vector Multiplication ◽

Storage Format ◽

Matrix Vector ◽

Cache Oblivious

Download Full-text

VBSF: a new storage format for SIMD sparse matrix–vector multiplication on modern processors

The Journal of Supercomputing ◽

10.1007/s11227-019-02835-4 ◽

2019 ◽

Vol 76 (3) ◽

pp. 2063-2081 ◽

Cited By ~ 1

Author(s):

Yishui Li ◽

Peizhen Xie ◽

Xinhai Chen ◽

Jie Liu ◽

Bo Yang ◽

...

Keyword(s):

Sparse Matrix ◽

Matrix Vector Multiplication ◽

Storage Format ◽

Matrix Vector

Download Full-text

A Novel Multi-GPU Parallel Optimization Model for The Sparse Matrix-Vector Multiplication

Parallel Processing Letters ◽

10.1142/s0129626416400016 ◽

2016 ◽

Vol 26 (04) ◽

pp. 1640001

Author(s):

Jiaquan Gao ◽

Yuanshen Zhou ◽

Kesong Wu

Keyword(s):

Optimization Model ◽

Graphics Processing Units ◽

High Efficiency ◽

Sparse Matrix ◽

Performance Model ◽

Parallel Optimization ◽

Multiple Gpus ◽

Matrix Vector Multiplication ◽

Storage Format ◽

Matrix Vector

Accelerating the sparse matrix-vector multiplication (SpMV) on the graphics processing units (GPUs) has attracted considerable attention recently. We observe that on a specific multiple-GPU platform, the SpMV performance can usually be greatly improved when a matrix is partitioned into several blocks according to a predetermined rule and each block is assigned to a GPU with an appropriate storage format. This motivates us to propose a novel multi-GPU parallel SpMV optimization model. Our model involves two stages. In the first stage, a simple rule is defined to divide any given matrix among multiple GPUs, and then a performance model, which is independent of the problems and dependent on the resources of devices, is proposed to accurately predict the execution time of SpMV kernels. Using these models, we construct in the second stage an optimally multi-GPU parallel SpMV algorithm that is automatically and rapidly generated for the platform for any problem. Given that our model for SpMV is general, independent of the problems, and dependent on the resources of devices, this model is constructed only once for each type of GPU. The experiments validate the high efficiency of our proposed model.

Download Full-text

Merge-based sparse matrix-vector multiplication (SpMV) using the CSR storage format

Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming - PPoPP '16 ◽

10.1145/2851141.2851190 ◽

2016 ◽

Cited By ~ 19

Author(s):

Duane Merrill ◽

Michael Garland

Keyword(s):

Sparse Matrix ◽

Matrix Vector Multiplication ◽

Storage Format ◽

Matrix Vector

Download Full-text

Implementation Procedures of Parallel Preconditioning with Sparse Matrix Based on FEM

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.166-169.3166 ◽

2012 ◽

Vol 166-169 ◽

pp. 3166-3173

Author(s):

Guo Liang Ji ◽

Yang De Feng ◽

Wen Kai Cui ◽

Liang Gang Lu

Keyword(s):

Stiffness Matrix ◽

Sparse Matrix ◽

Iterative Solvers ◽

Parallel Solvers ◽

Assembly Method ◽

Parallel Preconditioning ◽

Restarted Gmres ◽

Compressed Sparse Row ◽

Storage Format ◽

Global Stiffness Matrix

A technique to assemble global stiffness matrix stored in sparse storage format and two parallel solvers for sparse linear systems based on FEM are presented. The assembly method uses a data structure named associated node at intermediate stages to finally arrive at the Compressed Sparse Row (CSR) format. The associated nodes record the information about the connection of nodes in the mesh. The technique can reduce large memory because it only stores the nonzero elements of the global stiffness matrix. This method is simple and effective. The solvers are Restarted GMRES iterative solvers with Jacobi and sparse appropriate inverse (SPAI) preconditioning, respectively. Some numerical experiments show that the both preconditioners can improve the convergence of the iterative method, and SPAI is more powerful than Jacobi in the sence of reducing the number of iterations and parallel efficiency. Both of the two solvers can be used to solve large sparse linear system.

Download Full-text

COSC: Combine Optimized Sparse Matrix-Vector Multiplication for CSR Format

2011 Sixth Annual Chinagrid Conference ◽

10.1109/chinagrid.2011.39 ◽

2011 ◽

Author(s):

Ji-Lin Zhang ◽

Li Zhuang ◽

Jian Wan ◽

Xiang-Hua Xu ◽

Cong-Feng Jiang ◽

...

Keyword(s):

Sparse Matrix ◽

Matrix Vector Multiplication ◽

Matrix Vector ◽

Csr Format

Download Full-text

Sparse Matrix-vector Multiplication on GPGPU Clusters: A New Storage Format and a Scalable Implementation

2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum ◽

10.1109/ipdpsw.2012.211 ◽

2012 ◽

Cited By ~ 19

Author(s):

Moritz Kreutzer ◽

Georg Hager ◽

Gerhard Wellein ◽

Holger Fehske ◽

Achim Basermann ◽

...

Keyword(s):

Sparse Matrix ◽

Matrix Vector Multiplication ◽

Storage Format ◽

Matrix Vector

Download Full-text