π€ AI Summary
This work investigates fast matrix multiplication algorithms by seeking low-rank decompositions of the matrix multiplication tensor. To this end, the authors propose StrassenNet, a novel neural network architecture that, for the first time, automatically learns and numerically reproduces optimal low-rank decompositions through end-to-end training. In the 2Γ2 case, the model consistently converges to a rank-7 solution, accurately recovering Strassenβs algorithm. For the 3Γ3 case, the approach identifies a rank-23 decomposition that significantly outperforms all models of rank β€22, providing strong evidence that 23 is the minimal effective rank. Furthermore, the study introduces an Ξ΅-parameterization technique to model border rank decompositions, thereby extending the applicability of neural networks to more general tensor decomposition problems.
π Abstract
Fast matrix multiplication can be described as searching for low-rank decompositions of the matrix--multiplication tensor. We design a neural architecture, \textsc{StrassenNet}, which reproduces the Strassen algorithm for $2\times 2$ multiplication. Across many independent runs the network always converges to a rank-$7$ tensor, thus numerically recovering Strassen's optimal algorithm. We then train the same architecture on $3\times 3$ multiplication with rank $r\in\{19,\dots,23\}$. Our experiments reveal a clear numerical threshold: models with $r=23$ attain significantly lower validation error than those with $r\le 22$, suggesting that $r=23$ could actually be the smallest effective rank of the matrix multiplication tensor $3\times 3$.
We also sketch an extension of the method to border-rank decompositions via an $\varepsilon$--parametrisation and report preliminary results consistent with the known bounds for the border rank of the $3\times 3$ matrix--multiplication tensor.