Specializing Compiler Optimizations through Programmable Composition for Dense Matrix Computations

In this paper, we illustrate the possibility of developing strategies to carry out matrix computations on heterogeneous platforms which achieve native GPU performance on very large data sizes up to the capacity of the CPU memory. More specifically, we present a dense matrix multiplication strategy on a heterogeneous platform, specifically tailored for the case when the input is too large to fit on the device memory, which achieves near peak GPU performance. Our strategy involves the development of CUDA stream based software pipelines that effectively overlap PCIe data transfers with kernel executions. As a result, we are able to achieve over 1 and 2 TFLOPS performance on a single node using 1 and 2 GPUs respectively.

Download Full-text

Programming many-core architectures - a case study: dense matrix computations on the Intel single-chip cloud computer processor

Concurrency and Computation Practice and Experience ◽

10.1002/cpe.1832 ◽

2011 ◽

Vol 24 (12) ◽

pp. 1317-1333 ◽

Cited By ~ 6

Author(s):

Bryan Marker ◽

Ernie Chan ◽

Jack Poulson ◽

Robert Geijn ◽

Rob F. Van der Wijngaart ◽

...

Keyword(s):

Dense Matrix ◽

Single Chip ◽

Matrix Computations ◽

Many Core ◽

Computer Processor

Download Full-text

LU decomposition

Parallel Scientific Computation ◽

10.1093/oso/9780198788348.003.0002 ◽

2020 ◽

pp. 74-133

Author(s):

Rob H. Bisseling

Keyword(s):

Cost Analysis ◽

Linear Systems ◽

High Performance ◽

Gaussian Elimination ◽

Collective Communication ◽

Dense Matrix ◽

Lu Decomposition ◽

Two Phase ◽

Matrix Computations ◽

Communication Method

This chapter discusses parallel dense matrix computations, in particular the solution of linear systems by LU decomposition with partial row pivoting. It first presents a general Cartesian scheme for the distribution of matrices. Based on BSP cost analysis, the square cyclic distribution is proposed as particularly suitable for matrix computations such as LU decomposition and Gaussian elimination. The chapter introduces two-phase broadcasting of vectors, which is a useful collective-communication method for sending copies of matrix rows or columns to a group of processors. It also discusses how to achieve high performance by delaying rank-1 matrix updates to create a multiple-rank update, which can be carried out by multiplying tall-and-skinny matrices in a cache-friendly manner. The high-performance parallel LU decomposition is tested on a top-ranking supercomputer, and its performance is analysed with respect to computation, communication, and synchronization.

Download Full-text

Accelerating Dense Matrix Computations with Effective Workload Partitioning on Heterogeneous Architectures

IETE Journal of Research ◽

10.1080/03772063.2018.1436476 ◽

2018 ◽

Vol 65 (5) ◽

pp. 613-626

Author(s):

Mohsin Khan ◽

Waseem Ahmed ◽

Touseef M. Golandaz

Keyword(s):

Dense Matrix ◽

Matrix Computations ◽

Heterogeneous Architectures

Download Full-text

Recursive Approach in Sparse Matrix LU Factorization

Scientific Programming ◽

10.1155/2001/569670 ◽

2001 ◽

Vol 9 (1) ◽

pp. 51-60 ◽

Cited By ~ 8

Author(s):

Jack Dongarra ◽

Victor Eijkhout ◽

Piotr Łuszczek

Keyword(s):

Matrix Factorization ◽

Sparse Matrix ◽

Sparse Matrices ◽

Recursive Method ◽

Lu Factorization ◽

Dense Matrix ◽

Matrix Computations ◽

Software Packages ◽

Performance Results ◽

Sparse Matrix Factorization

This paper describes a recursive method for the LU factorization of sparse matrices. The recursive formulation of common linear algebra codes has been proven very successful in dense matrix computations. An extension of the recursive technique for sparse matrices is presented. Performance results given here show that the recursive approach may perform comparable to leading software packages for sparse matrix factorization in terms of execution time, memory usage, and error estimates of the solution.

Download Full-text

Unit II: Dense Matrix Computations

Scientific Computing with Case Studies ◽

10.1137/9780898717723.pt2 ◽

2009 ◽

pp. 46-48

Keyword(s):

Dense Matrix ◽

Matrix Computations

Download Full-text