π€ AI Summary
Existing smoothness definitions and convergence guarantees for nonsmooth machine learning objectives are inadequate in non-Euclidean spaces, where classical Euclidean norms fail to capture intrinsic geometric structure.
Method: We introduce β*-smoothnessβa novel smoothness notion defined with respect to arbitrary norm pairs (not only the Euclidean norm)βto characterize local curvature of objective functions, and propose a generalized self-bounding property.
Contribution/Results: Building on this framework, we establish the first universal convergence theory for mirror descent-type algorithms under both deterministic and stochastic settings: (i) under β*-smoothness, deterministic mirror descent achieves optimal convergence rates matching those of classical smooth optimization; (ii) under a bounded noise condition, stochastic mirror descent attains anytime convergence guarantees. Our work unifies and extends nonsmooth optimization theory by generalizing smoothness to arbitrary convex geometries, thereby providing a rigorous foundation for structured learning problems in non-Euclidean spaces.
π Abstract
Smoothness is crucial for attaining fast rates in first-order optimization. However, many optimization problems in modern machine learning involve non-smooth objectives. Recent studies relax the smoothness assumption by allowing the Lipschitz constant of the gradient to grow with respect to the gradient norm, which accommodates a broad range of objectives in practice. Despite this progress, existing generalizations of smoothness are restricted to Euclidean geometry with $ell_2$-norm and only have theoretical guarantees for optimization in the Euclidean space. In this paper, we address this limitation by introducing a new $ell*$-smoothness concept that measures the norm of Hessian in terms of a general norm and its dual, and establish convergence for mirror-descent-type algorithms, matching the rates under the classic smoothness. Notably, we propose a generalized self-bounding property that facilitates bounding the gradients via controlling suboptimality gaps, serving as a principal component for convergence analysis. Beyond deterministic optimization, we establish an anytime convergence for stochastic mirror descent based on a new bounded noise condition that encompasses the widely adopted bounded or affine noise assumptions.