🤖 AI Summary
This work investigates the mechanism by which mirror descent flows in homogeneous neural networks converge to maximum-margin solutions, offering a unified perspective on induced sparse and dense feature learning. By leveraging convex duality, the authors derive equilibrium conditions that characterize the margin-controlling level function and systematically analyze how different mirror maps influence optimization dynamics and classifier geometry. Theoretically, this paper establishes the first maximum-margin characterization of mirror flows in homogeneous networks, proving that non-homogeneous mirror maps can yield identical classifiers yet drastically distinct feature representations, while providing rigorous bounds on convergence rates and norm growth. Experiments confirm that mirror flows generate a spectrum of activation patterns—from sparse to dense—and reveal that convergence can be extremely slow, potentially exponential in rate.
📝 Abstract
We study the max-margin solutions reached by mirror flow in deep neural networks with homogeneous activation functions. Extending classical results on gradient flow, we derive a novel balance equation for mirror flow from convex duality, enabling a characterization of the horizon function governing the induced margin. We further establish max-margin characterizations together with convergence rates and norm growth estimates. Finally, we support our theory through experiments on synthetic datasets and standard vision tasks. Concretely, we show that: (1) distinct non-homogeneous mirror maps can induce the same max-margin solution; (2) convergence can be extremely slow, including exponentially slow regimes; and (3) although all considered mirror maps exhibit feature learning, they can produce markedly different representations, ranging from sparse to dense neuron activations. Together, these results provide a unified perspective on sparse and dense feature learning in homogeneous neural networks, highlighting how mirror maps shape both optimization dynamics and the geometry of the learned classifiers.