🤖 AI Summary
This paper studies the round complexity of distributed matrix multiplication under the Massively Parallel Computation (MPC) model. Addressing dense square matrices, rectangular matrices, and sparse matrices under memory constraints and multi-round synchronous communication, it establishes— for the first time—nearly tight upper and lower bounds across all three cases. Methodologically, the work integrates a semiring algebraic framework, block decomposition, and load-balancing strategies, combined with information-theoretic lower bound analysis and constructive algorithm design. Key contributions are: (1) an exact round complexity of Θ(n^{α/2}) for multiplying two n×n dense matrices; (2) an O(d^{0.9})-round upper bound for d-sparse matrix multiplication—the first sublinear round complexity guarantee; and (3) comprehensive coverage of four canonical memory-vs.-input-size regimes, significantly extending prior work in model adaptability and scenario breadth. All derived bounds are asymptotically tight.
📝 Abstract
In this paper, we study the matrix multiplication problem in the MPC model. We have two matrices, and the task is to compute their product. These matrices are evenly distributed over $P$ processors. Each processor has $M$ memory such that $P cdot M geq $ (size of the matrices). The computation proceeds in synchronous rounds. In a communication round, a processor can send and receive messages to(from) any other processor, with the total size of messages sent or received being $O(M)$. We give an almost complete characterisation of the problem in various settings. We prove tight upper bounds and lower bounds for the problems in three different settings--when the given input matrices are (i) general square matrices, (ii) rectangular matrices, and (iii) sparse square matrices (that is, each row and column contains a bounded number of nonzero elements). In particular, we prove the following results: 1. Multiplication of two $n imes n$ matrices in the MPC model with $n^alpha$ processors each with $O(n^{2-alpha})$ memory, requires $Theta(n^{frac{alpha}{2}})$ rounds in semirings. 2. Multiplication of two rectangular matrices of size $n imes d$ and $d imes n$ (where $d leq n$) respectively, with $n$ processors of $O(n)$ memory requires $Theta(frac{d}{sqrt{n}})$ rounds in semirings. 3. Multiplication of two rectangular matrices of size $d imes n$ and $n imes d$ ( where $d leq n$) respectively requires i. $Theta(sqrt{d} + log_d n)$ rounds with $n$ processors and $O(d)$ memory per processor in semirings ii. $Theta (frac{d}{sqrt{n}})$ rounds with $d$ processors and $O(n)$ memory per processor in semirings. 4. Multiplication of two $d$-sparse matrices (each row and column of the matrices contains at most $d$-nonzero elements) with $n$ processors and $O(d)$ memory per processor can be done in $O(d^{0.9})$ rounds in semirings.