GPU Implementation of Image Convolution Using Sparse Model with Efficient Storage Format

2018 ◽  
Vol 10 (1) ◽  
pp. 54-70
Author(s):  
Saira Banu Jamal Mohammed ◽  
M. Rajasekhara Babu ◽  
Sumithra Sriram

With the growth of data parallel computing, role of GPU computing in non-graphic applications such as image processing becomes a focus in research fields. Convolution is an integral operation in filtering, smoothing and edge detection. In this article, the process of convolution is realized as a sparse linear system and is solved using Sparse Matrix Vector Multiplication (SpMV). The Compressed Sparse Row (CSR) format of SPMV shows better CPU performance compared to normal convolution. To overcome the stalling of threads for short rows in the GPU implementation of CSR SpMV, a more efficient model is proposed, which uses the Adaptive-Compressed Row Storage (A-CSR) format to implement the same. Using CSR in the convolution process achieves a 1.45x and a 1.159x increase in speed compared to the normal convolution of image smoothing and edge detection operations, respectively. An average speedup of 2.05x is achieved for image smoothing technique and 1.58x for edge detection technique in GPU platform usig adaptive CSR format.

2016 ◽  
Vol 2016 ◽  
pp. 1-12 ◽  
Author(s):  
Guixia He ◽  
Jiaquan Gao

Sparse matrix-vector multiplication (SpMV) is an important operation in scientific computations. Compressed sparse row (CSR) is the most frequently used format to store sparse matrices. However, CSR-based SpMVs on graphic processing units (GPUs), for example, CSR-scalar and CSR-vector, usually have poor performance due to irregular memory access patterns. This motivates us to propose a perfect CSR-based SpMV on the GPU that is called PCSR. PCSR involves two kernels and accesses CSR arrays in a fully coalesced manner by introducing a middle array, which greatly alleviates the deficiencies of CSR-scalar (rare coalescing) and CSR-vector (partial coalescing). Test results on a single C2050 GPU show that PCSR fully outperforms CSR-scalar, CSR-vector, and CSRMV and HYBMV in the vendor-tuned CUSPARSE library and is comparable with a most recently proposed CSR-based algorithm, CSR-Adaptive. Furthermore, we extend PCSR on a single GPU to multiple GPUs. Experimental results on four C2050 GPUs show that no matter whether the communication between GPUs is considered or not PCSR on multiple GPUs achieves good performance and has high parallel efficiency.


2016 ◽  
Vol 54 ◽  
pp. 490-500 ◽  
Author(s):  
Jilin Zhang ◽  
Jian Wan ◽  
Fangfang Li ◽  
Jie Mao ◽  
Li Zhuang ◽  
...  

2019 ◽  
Vol 76 (3) ◽  
pp. 2063-2081 ◽  
Author(s):  
Yishui Li ◽  
Peizhen Xie ◽  
Xinhai Chen ◽  
Jie Liu ◽  
Bo Yang ◽  
...  

2016 ◽  
Vol 26 (04) ◽  
pp. 1640001
Author(s):  
Jiaquan Gao ◽  
Yuanshen Zhou ◽  
Kesong Wu

Accelerating the sparse matrix-vector multiplication (SpMV) on the graphics processing units (GPUs) has attracted considerable attention recently. We observe that on a specific multiple-GPU platform, the SpMV performance can usually be greatly improved when a matrix is partitioned into several blocks according to a predetermined rule and each block is assigned to a GPU with an appropriate storage format. This motivates us to propose a novel multi-GPU parallel SpMV optimization model. Our model involves two stages. In the first stage, a simple rule is defined to divide any given matrix among multiple GPUs, and then a performance model, which is independent of the problems and dependent on the resources of devices, is proposed to accurately predict the execution time of SpMV kernels. Using these models, we construct in the second stage an optimally multi-GPU parallel SpMV algorithm that is automatically and rapidly generated for the platform for any problem. Given that our model for SpMV is general, independent of the problems, and dependent on the resources of devices, this model is constructed only once for each type of GPU. The experiments validate the high efficiency of our proposed model.


2012 ◽  
Vol 166-169 ◽  
pp. 3166-3173
Author(s):  
Guo Liang Ji ◽  
Yang De Feng ◽  
Wen Kai Cui ◽  
Liang Gang Lu

A technique to assemble global stiffness matrix stored in sparse storage format and two parallel solvers for sparse linear systems based on FEM are presented. The assembly method uses a data structure named associated node at intermediate stages to finally arrive at the Compressed Sparse Row (CSR) format. The associated nodes record the information about the connection of nodes in the mesh. The technique can reduce large memory because it only stores the nonzero elements of the global stiffness matrix. This method is simple and effective. The solvers are Restarted GMRES iterative solvers with Jacobi and sparse appropriate inverse (SPAI) preconditioning, respectively. Some numerical experiments show that the both preconditioners can improve the convergence of the iterative method, and SPAI is more powerful than Jacobi in the sence of reducing the number of iterations and parallel efficiency. Both of the two solvers can be used to solve large sparse linear system.


Author(s):  
Ji-Lin Zhang ◽  
Li Zhuang ◽  
Jian Wan ◽  
Xiang-Hua Xu ◽  
Cong-Feng Jiang ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document