Towards Faster Feasible Matrix Multiplication by Trilinear Aggregation

📅 2025-08-03

📈 Citations: 0

✨ Influential: 0

career value

253K/year

🤖 AI Summary

This work addresses the practical inefficiency of asymptotically optimal matrix multiplication algorithms on small-to-moderate input sizes (base case $n_0 < 1000$). To overcome poor performance of classical approaches for such scales, we introduce a novel method that unifies trilinear aggregation with de Groote’s equivalence class theory and incorporates sparse decomposition techniques. This enables systematic identification and replacement of substructures equivalent to small-matrix multiplication, preserving asymptotic optimality while drastically reducing additive complexity and leading constants. Our approach achieves asymptotic optimality even for base cases as small as $n_0 = 28$. The resulting algorithm attains an arithmetic complexity of $O(n^{2.773203})$, constituting the fastest known matrix multiplication algorithm for all $n_0 < 1000$. This work significantly advances the practical applicability of fast matrix multiplication by bridging the gap between theoretical asymptotics and real-world computational constraints.

Technology Category

Application Category

📝 Abstract

Matrix multiplication is a fundamental kernel in high performance computing. Many algorithms for fast matrix multiplication can only be applied to enormous matrices ($n>10^{100}$) and thus cannot be used in practice. Of all algorithms applicable to feasible input, Pan's $O(n^{2.773372})$ algorithm (1982) is asymptotically the fastest. We obtain an $O(n^{2.773203})$ algorithm applicable to the same input sizes as Pan's algorithm. This algorithm is the fastest matrix multiplication algorithm with base case smaller than $1000$. Further, our method obtains the best asymptotic complexity for many small base cases, starting at $n_0=28$. We also obtain better exponents for larger base cases. To construct our algorithm, we use the trilinear aggregation method. We find parts of the algorithms that are equivalent to matrix multiplication with smaller base case, and use the de Groote equivalence to replace these parts in a way that allows further optimization of our algorithms. Finally, we improve the additive complexity of our algorithms by finding a sparse decomposition and reducing the leading coefficient. These mark a fundamental step towards outperforming existing fast matrix multiplication algorithms in practice.

Problem

Research questions and friction points this paper is trying to address.

Develop faster feasible matrix multiplication algorithm

Improve asymptotic complexity for small base cases

Optimize algorithms using trilinear aggregation method

Innovation

Methods, ideas, or system contributions that make the work stand out.

Trilinear aggregation method for optimization

De Groote equivalence for base case replacement

Sparse decomposition to reduce additive complexity

🔎 Similar Papers

A fast Multiplicative Updates algorithm for Non-negative Matrix Factorization