🤖 AI Summary
This work addresses efficient sampling from a target distribution π by formulating sampling as a KL-divergence minimization problem over the space of probability measures. Leveraging the Wasserstein–Fisher–Rao (WFR) mixed information geometry, we derive— for the first time—the corresponding gradient flow partial differential equation and design the first sequential Monte Carlo (SMC)-based approximation algorithm for the WFR gradient flow. Our method integrates WFR geometry, importance sampling, and tempered path analysis, overcoming modeling limitations inherent in purely Wasserstein or Fisher–Rao geometric frameworks. This integration substantially enhances robustness and convergence stability in variational inference dynamics. We theoretically prove that tempering does not accelerate convergence of the WFR flow, thereby clarifying the effective boundaries of geometric hybridization. Empirically, our algorithm achieves significant improvements over state-of-the-art methods across multiple benchmark tasks.
📝 Abstract
We consider the problem of sampling from a probability distribution $pi$. It is well known that this can be written as an optimisation problem over the space of probability distribution in which we aim to minimise the Kullback--Leibler divergence from $pi$. We consider several partial differential equations (PDEs) whose solution is a minimiser of the Kullback--Leibler divergence from $pi$ and connect them to well-known Monte Carlo algorithms. We focus in particular on PDEs obtained by considering the Wasserstein--Fisher--Rao geometry over the space of probabilities and show that these lead to a natural implementation using importance sampling and sequential Monte Carlo. We propose a novel algorithm to approximate the Wasserstein--Fisher--Rao flow of the Kullback--Leibler divergence which empirically outperforms the current state-of-the-art. We study tempered versions of these PDEs obtained by replacing the target distribution with a geometric mixture of initial and target distribution and show that these do not lead to a convergence speed up.