🤖 AI Summary
This study addresses the problem of efficiently learning an unknown probability distribution from limited samples and provides non-asymptotic, computable error bounds in the Wasserstein distance. To this end, the authors propose a novel framework that integrates optimal transport theory, concentration inequalities, and mixed-integer linear programming. By leveraging intelligent clustering to optimize the selection of support points, the method constructs a tractable optimization problem that depends only on the size of the support set of the empirical distribution. This approach achieves, for the first time, high-confidence Wasserstein error bounds without requiring prior knowledge of the true distribution. Experimental results demonstrate that the method significantly reduces the support set size across multiple benchmarks while yielding substantially tighter error bounds.
📝 Abstract
The Wasserstein distance has emerged as a key metric to quantify distances between probability distributions, with applications in various fields, including machine learning, control theory, decision theory, and biological systems. Consequently, learning an unknown distribution with non-asymptotic and easy-to-compute error bounds in Wasserstein distance has become a fundamental problem in many fields. In this paper, we devise a novel algorithmic and theoretical framework to approximate an unknown probability distribution $\mathbb{P}$ from a finite set of samples by an approximate discrete distribution $\widehat{\mathbb{P}}$ while bounding the Wasserstein distance between $\mathbb{P}$ and $\widehat{\mathbb{P}}$. Our framework leverages optimal transport, nonlinear optimization, and concentration inequalities. In particular, we show that, even if $\mathbb{P}$ is unknown, the Wasserstein distance between $\mathbb{P}$ and $\widehat{\mathbb{P}}$ can be efficiently bounded with high confidence by solving a tractable optimization problem (a mixed integer linear program) of a size that only depends on the size of the support of $\widehat{\mathbb{P}}$. This enables us to develop intelligent clustering algorithms to optimally find the support of $\widehat{\mathbb{P}}$ while minimizing the Wasserstein distance error. On a set of benchmarks, we demonstrate that our approach outperforms state-of-the-art comparable methods by generally returning approximating distributions with substantially smaller support and tighter error bounds.