Efficient CSR-Based Sparse Matrix-Vector Multiplication on GPU

Mathematical Problems in Engineering ◽

10.1155/2016/4596943 ◽

2016 ◽

Vol 2016 ◽

pp. 1-14 ◽

Cited By ~ 1

Author(s):

Jiaquan Gao ◽

Panpan Qi ◽

Guixia He

Keyword(s):

Iterative Methods ◽

Shared Memory ◽

Eigenvalue Problems ◽

Sparse Matrix ◽

Computational Science ◽

Test Results ◽

Thread Block ◽

Matrix Vector Multiplication ◽

Compressed Sparse Row ◽

Matrix Vector

Sparse matrix-vector multiplication (SpMV) is an important operation in computational science and needs be accelerated because it often represents the dominant cost in many widely used iterative methods and eigenvalue problems. We achieve this objective by proposing a novel SpMV algorithm based on the compressed sparse row (CSR) on the GPU. Our method dynamically assigns different numbers of rows to each thread block and executes different optimization implementations on the basis of the number of rows it involves for each block. The process of accesses to the CSR arrays is fully coalesced, and the GPU’s DRAM bandwidth is efficiently utilized by loading data into the shared memory, which alleviates the bottleneck of many existing CSR-based algorithms (i.e., CSR-scalar and CSR-vector). Test results on C2050 and K20c GPUs show that our method outperforms a perfect-CSR algorithm that inspires our work, the vendor tuned CUSPARSE V6.5 and CUSP V0.5.1, and three popular algorithms clSpMV, CSR5, and CSR-Adaptive.

Download Full-text

A Novel CSR-Based Sparse Matrix-Vector Multiplication on GPUs

Mathematical Problems in Engineering ◽

10.1155/2016/8471283 ◽

2016 ◽

Vol 2016 ◽

pp. 1-12 ◽

Cited By ~ 3

Author(s):

Guixia He ◽

Jiaquan Gao

Keyword(s):

Sparse Matrix ◽

Sparse Matrices ◽

Poor Performance ◽

Test Results ◽

Graphic Processing Units ◽

Multiple Gpus ◽

Matrix Vector Multiplication ◽

Compressed Sparse Row ◽

Access Patterns ◽

Matrix Vector

Sparse matrix-vector multiplication (SpMV) is an important operation in scientific computations. Compressed sparse row (CSR) is the most frequently used format to store sparse matrices. However, CSR-based SpMVs on graphic processing units (GPUs), for example, CSR-scalar and CSR-vector, usually have poor performance due to irregular memory access patterns. This motivates us to propose a perfect CSR-based SpMV on the GPU that is called PCSR. PCSR involves two kernels and accesses CSR arrays in a fully coalesced manner by introducing a middle array, which greatly alleviates the deficiencies of CSR-scalar (rare coalescing) and CSR-vector (partial coalescing). Test results on a single C2050 GPU show that PCSR fully outperforms CSR-scalar, CSR-vector, and CSRMV and HYBMV in the vendor-tuned CUSPARSE library and is comparable with a most recently proposed CSR-based algorithm, CSR-Adaptive. Furthermore, we extend PCSR on a single GPU to multiple GPUs. Experimental results on four C2050 GPUs show that no matter whether the communication between GPUs is considered or not PCSR on multiple GPUs achieves good performance and has high parallel efficiency.

Download Full-text

High-Level Strategies for Parallel Shared-Memory Sparse Matrix-Vector Multiplication

IEEE Transactions on Parallel and Distributed Systems ◽

10.1109/tpds.2013.31 ◽

2014 ◽

Vol 25 (1) ◽

pp. 116-125 ◽

Cited By ~ 26

Author(s):

Albert-Jan Nicholas Yzelman ◽

Dirk Roose

Keyword(s):

Shared Memory ◽

Sparse Matrix ◽

Matrix Vector Multiplication ◽

High Level ◽

Matrix Vector

Download Full-text

Accelerating Sparse Matrix Vector Multiplication in Iterative Methods Using GPU

2011 International Conference on Parallel Processing ◽

10.1109/icpp.2011.82 ◽

2011 ◽

Cited By ~ 18

Author(s):

Kiran Kumar Matam ◽

Kishore Kothapalli

Keyword(s):

Iterative Methods ◽

Sparse Matrix ◽

Matrix Vector Multiplication ◽

Matrix Vector

Download Full-text

Optimization of Block Sparse Matrix-Vector Multiplication on Shared-Memory Parallel Architectures

2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) ◽

10.1109/ipdpsw.2016.42 ◽

2016 ◽

Cited By ~ 1

Author(s):

Ryan Eberhardt ◽

Mark Hoemmen

Keyword(s):

Shared Memory ◽

Sparse Matrix ◽

Parallel Architectures ◽

Matrix Vector Multiplication ◽

Matrix Vector

Download Full-text

SpaceA: Sparse Matrix Vector Multiplication on Processing-in-Memory Accelerator

2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA) ◽

10.1109/hpca51647.2021.00055 ◽

2021 ◽

Author(s):

Xinfeng Xie ◽

Zheng Liang ◽

Peng Gu ◽

Abanti Basak ◽

Lei Deng ◽

...

Keyword(s):

Sparse Matrix ◽

Matrix Vector Multiplication ◽

Matrix Vector

Download Full-text

Conflict-free symmetric sparse matrix-vector multiplication on multicore architectures

Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis ◽

10.1145/3295500.3356148 ◽

2019 ◽

Cited By ~ 1

Author(s):

Athena Elafrou ◽

Georgios Goumas ◽

Nectarios Koziris

Keyword(s):

Sparse Matrix ◽

Multicore Architectures ◽

Matrix Vector Multiplication ◽

Matrix Vector

Download Full-text

Sparse Matrix-Vector Multiplication on GPGPUs

ACM Transactions on Mathematical Software ◽

10.1145/3017994 ◽

2017 ◽

Vol 43 (4) ◽

pp. 1-49 ◽

Cited By ~ 34

Author(s):

Salvatore Filippone ◽

Valeria Cardellini ◽

Davide Barbieri ◽

Alessandro Fanfarillo

Keyword(s):

Sparse Matrix ◽

Matrix Vector Multiplication ◽

Matrix Vector

Download Full-text

Multicoloring for Fast Sparse Matrix-Vector Multiplication in Solving PDE Problems

1993 International Conference on Parallel Processing - ICPP'93 Vol1 ◽

10.1109/icpp.1993.119 ◽

1993 ◽

Cited By ~ 1

Author(s):

H.C. Wang ◽

Kai Hwang

Keyword(s):

Sparse Matrix ◽

Matrix Vector Multiplication ◽

Matrix Vector

Download Full-text

Modeling contention of sparse-matrix-vector multiplication (SMV) in three parallel programming paradigms

Proceedings of the 6th international workshop on Software and performance - WOSP '07 ◽

10.1145/1216993.1217003 ◽

2007 ◽

Author(s):

Ahmed Sameh ◽

Tarek El-Ghazawi ◽

Yesha Yacoov

Keyword(s):

Parallel Programming ◽

Sparse Matrix ◽

Matrix Vector Multiplication ◽

Programming Paradigms ◽

Matrix Vector

Download Full-text

On the Memory Wall and Performance of Symmetric Sparse Matrix Vector Multiplications In Different Data Structures on Shared Memory Machines

2015 IEEE 12th Intl Conf on Ubiquitous Intelligence and Computing and 2015 IEEE 12th Intl Conf on Autonomic and Trusted Computing and 2015 IEEE 15th Intl Conf on Scalable Computing and Communications and Its Associated Workshops (UIC-ATC-ScalCom) ◽

10.1109/uic-atc-scalcom-cbdcom-iop.2015.259 ◽

2015 ◽

Cited By ~ 1

Author(s):

Tongxiang Gu ◽

Xingping Liu ◽

Zeyao Mo ◽

Xiaowen Xu ◽

Shengxin Zhu

Keyword(s):

Shared Memory ◽

Data Structures ◽

Sparse Matrix ◽

Memory Wall ◽

And Performance ◽

Matrix Vector

Download Full-text