🤖 AI Summary
Gradient-based sampling algorithms—such as the Unadjusted Langevin Algorithm (ULA), Stochastic Gradient Langevin Dynamics (SGLD), Mirror-Flow Langevin Dynamics (MFLD), Stein Variational Gradient Descent (SVGD), and Variational Gradient Descent (VGD)—operating on probability measure spaces typically require manual step-size tuning, hindering practical deployment.
Method: This paper proposes a parameter-free, adaptive step-size strategy grounded in Wasserstein gradient flow discretization, eliminating the need for hand-tuned step sizes. Under assumptions of geodesic convexity and local boundedness of stochastic gradients, the method guarantees convergence rates approaching those of optimally tuned counterparts.
Contribution/Results: It is the first work to unify multiple classical sampling algorithms into step-size–free variants with theoretical convergence guarantees. Empirically, the proposed variants match the performance of their optimally tuned versions across Bayesian inference and generative modeling tasks, significantly improving usability, robustness, and deployment efficiency.
📝 Abstract
We introduce adaptive, tuning-free step size schedules for gradient-based sampling algorithms obtained as time-discretizations of Wasserstein gradient flows. The result is a suite of tuning-free sampling algorithms, including tuning-free variants of the unadjusted Langevin algorithm (ULA), stochastic gradient Langevin dynamics (SGLD), mean-field Langevin dynamics (MFLD), Stein variational gradient descent (SVGD), and variational gradient descent (VGD). More widely, our approach yields tuning-free algorithms for solving a broad class of stochastic optimization problems over the space of probability measures. Under mild assumptions (e.g., geodesic convexity and locally bounded stochastic gradients), we establish strong theoretical guarantees for our approach. In particular, we recover the convergence rate of optimally tuned versions of these algorithms up to logarithmic factors, in both nonsmooth and smooth settings. We then benchmark the performance of our methods against comparable existing approaches. Across a variety of tasks, our algorithms achieve similar performance to the optimal performance of existing algorithms, with no need to tune a step size parameter.