Improved Finite-Particle Convergence Rates for Stein Variational Gradient Descent

📅 2024-09-13
🏛️ arXiv.org
📈 Citations: 3
Influential: 0
📄 PDF
🤖 AI Summary
Stein variational gradient descent (SVGD) suffers from slow convergence rates under finite-particle regimes, particularly with respect to kernelized Stein discrepancy (KSD) and Wasserstein-2 distance. Method: We conduct a rigorous theoretical analysis of SVGD’s convergence behavior, leveraging relative entropy methods, Matérn kernel constructions, propagation-of-chaos theory, and particle-system dynamics modeling. We further propose an extended SVGD framework based on bilinear kernels to enable continuous-time analysis. Results: We establish, for the first time, an explicit $O(1/sqrt{N})$ convergence rate in KSD, achieving double-exponential acceleration over prior bounds. Moreover, we provide the first Wasserstein-2 convergence guarantee for the continuous-time dynamics of SVGD. Our analysis reveals polynomial dependence of the KSD rate on dimensionality and rigorously proves long-term convergence of marginal distributions and propagation of chaos in the mean-field limit.

Technology Category

Application Category

📝 Abstract
We provide finite-particle convergence rates for the Stein Variational Gradient Descent (SVGD) algorithm in the Kernelized Stein Discrepancy ($mathsf{KSD}$) and Wasserstein-2 metrics. Our key insight is that the time derivative of the relative entropy between the joint density of $N$ particle locations and the $N$-fold product target measure, starting from a regular initial distribution, splits into a dominant `negative part' proportional to $N$ times the expected $mathsf{KSD}^2$ and a smaller `positive part'. This observation leads to $mathsf{KSD}$ rates of order $1/sqrt{N}$, in both continuous and discrete time, providing a near optimal (in the sense of matching the corresponding i.i.d. rates) double exponential improvement over the recent result by Shi and Mackey (2024). Under mild assumptions on the kernel and potential, these bounds also grow polynomially in the dimension $d$. By adding a bilinear component to the kernel, the above approach is used to further obtain Wasserstein-2 convergence in continuous time. For the case of `bilinear + Mat'ern' kernels, we derive Wasserstein-2 rates that exhibit a curse-of-dimensionality similar to the i.i.d. setting. We also obtain marginal convergence and long-time propagation of chaos results for the time-averaged particle laws.
Problem

Research questions and friction points this paper is trying to address.

Finite-particle convergence rates for SVGD in KSD and Wasserstein-2 metrics
Improved KSD convergence rates with order 1/sqrt(N) in continuous and discrete time
Wasserstein-2 convergence with curse-of-dimensionality for bilinear + Matérn kernels
Innovation

Methods, ideas, or system contributions that make the work stand out.

Improved convergence rates for SVGD algorithm
Kernelized Stein Discrepancy metrics analysis
Bilinear kernel enhances Wasserstein-2 convergence
🔎 Similar Papers
No similar papers found.
Krishnakumar Balasubramanian
Krishnakumar Balasubramanian
University of California, Davis
StatisticsOptimizationMachine learning
S
Sayan Banerjee
Department of Statistics and Operations Research, University of North Carolina, Chapel Hill
P
Promit Ghosal
Department of Statistics, University of Chicago