Communication lower bounds for distributed-memory matrix multiplication

2004 ◽  
Vol 64 (9) ◽  
pp. 1017-1026 ◽  
Author(s):  
Dror Irony ◽  
Sivan Toledo ◽  
Alexander Tiskin
Acta Numerica ◽  
2014 ◽  
Vol 23 ◽  
pp. 1-155 ◽  
Author(s):  
G. Ballard ◽  
E. Carson ◽  
J. Demmel ◽  
M. Hoemmen ◽  
N. Knight ◽  
...  

The traditional metric for the efficiency of a numerical algorithm has been the number of arithmetic operations it performs. Technological trends have long been reducing the time to perform an arithmetic operation, so it is no longer the bottleneck in many algorithms; rather, communication, or moving data, is the bottleneck. This motivates us to seek algorithms that move as little data as possible, either between levels of a memory hierarchy or between parallel processors over a network. In this paper we summarize recent progress in three aspects of this problem. First we describe lower bounds on communication. Some of these generalize known lower bounds for dense classical (O(n3)) matrix multiplication to all direct methods of linear algebra, to sequential and parallel algorithms, and to dense and sparse matrices. We also present lower bounds for Strassen-like algorithms, and for iterative methods, in particular Krylov subspace methods applied to sparse matrices. Second, we compare these lower bounds to widely used versions of these algorithms, and note that these widely used algorithms usually communicate asymptotically more than is necessary. Third, we identify or invent new algorithms for most linear algebra problems that do attain these lower bounds, and demonstrate large speed-ups in theory and practice.


2015 ◽  
Vol 44 (5) ◽  
pp. 1173-1201 ◽  
Author(s):  
Hervé Fournier ◽  
Nutan Limaye ◽  
Guillaume Malod ◽  
Srikanth Srinivasan

Sign in / Sign up

Export Citation Format

Share Document