Geometric Convergence Analysis of Variational Inference via Bregman Divergences

📅 2025-10-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Variational inference (VI) suffers from theoretical challenges in analyzing convergence of ELBO optimization due to its inherent nonconvexity and nonsmoothness. Method: Leveraging the structural properties of the log-partition function for exponential-family distributions, we reformulate the negative ELBO as a Bregman divergence, thereby establishing a unified information-geometric framework for VI analysis. Contribution/Results: We first reveal a weak monotonicity property of the ELBO optimization landscape and, by exploiting spectral characteristics of the Fisher information matrix, derive the first non-asymptotic convergence rate bounds for gradient descent under both fixed and diminishing step sizes. Crucially, our analysis dispenses with standard strong convexity or Lipschitz gradient assumptions—significantly broadening the applicability of VI convergence theory. This work provides the first geometrically grounded convergence theory for Bayesian variational learning with explicit, non-asymptotic rate guarantees.

Technology Category

Application Category

📝 Abstract
Variational Inference (VI) provides a scalable framework for Bayesian inference by optimizing the Evidence Lower Bound (ELBO), but convergence analysis remains challenging due to the objective's non-convexity and non-smoothness in Euclidean space. We establish a novel theoretical framework for analyzing VI convergence by exploiting the exponential family structure of distributions. We express negative ELBO as a Bregman divergence with respect to the log-partition function, enabling a geometric analysis of the optimization landscape. We show that this Bregman representation admits a weak monotonicity property that, while weaker than convexity, provides sufficient structure for rigorous convergence analysis. By deriving bounds on the objective function along rays in parameter space, we establish properties governed by the spectral characteristics of the Fisher information matrix. Under this geometric framework, we prove non-asymptotic convergence rates for gradient descent algorithms with both constant and diminishing step sizes.
Problem

Research questions and friction points this paper is trying to address.

Analyzing convergence of variational inference via Bregman divergences
Establishing geometric framework using exponential family distributions
Proving non-asymptotic convergence rates for gradient descent algorithms
Innovation

Methods, ideas, or system contributions that make the work stand out.

Bregman divergence representation for ELBO
Geometric analysis using exponential family structure
Non-asymptotic convergence rates via Fisher information