🤖 AI Summary
To address the scalability bottleneck of classical optimal transport (OT) algorithms—which require $O(n^2)$ memory and thus hinder application to large-scale problems—this paper proposes Dual-space Extragradient (DXG), the first algorithm achieving $O(n)$ linear memory complexity while maintaining an $O(n^2 varepsilon^{-1})$ convergence rate. DXG operates exclusively in the dual space, reformulating the primal-dual extragradient method into a purely dual variant. We establish theoretical equivalence between its regularized dual update and both entropic regularization and its natural generalizations. Compared to state-of-the-art methods, DXG significantly improves computational efficiency and solution accuracy—particularly in unregularized and weakly regularized regimes—thereby greatly enhancing the scalability of OT in machine learning and statistical inference. Extensive experiments validate its superior empirical performance and practical deployability.
📝 Abstract
Optimal transport (OT) and its entropy-regularized form (EOT) have become increasingly prominent computational problems, with applications in machine learning and statistics. Recent years have seen a commensurate surge in first-order methods aiming to improve the complexity of large-scale (E)OT. However, there has been a consistent tradeoff: attaining state-of-the-art rates requires $mathcal{O}(n^2)$ storage to enable ergodic primal averaging. In this work, we demonstrate that recently proposed primal-dual extragradient methods (PDXG) can be implemented entirely in the dual with $mathcal{O}(n)$ storage. Additionally, we prove that regularizing the reformulated OT problem is equivalent to EOT with extensions to entropy-regularized barycenter problems, further widening the applications of the proposed method. The proposed dual-only extragradient method (DXG) is the first algorithm to achieve $mathcal{O}(n^2varepsilon^{-1})$ complexity for $varepsilon$-approximate OT with $mathcal{O}(n)$ memory. Numerical experiments demonstrate that the dual extragradient method scales favorably in non/weakly-regularized regimes compared to existing algorithms, though future work is needed to improve performance in certain problem classes.