🤖 AI Summary
This study investigates finite-sample convergence rates for the central limit theorem (CLT) in Wasserstein-p distance for multivariate dependent data, focusing on two canonical dependence structures: locally dependent sequences and geometrically ergodic Markov chains. By establishing a Wasserstein-1 Gaussian approximation error bound tailored to dependent data and proving that the regeneration times of geometrically ergodic Markov chains exhibit geometric tails—without requiring strong aperiodicity assumptions—the work achieves, for the first time, the optimal $O(n^{-1/2})$ convergence rate in $W_1$ distance. Under mild moment conditions, it further extends this result to obtain $W_p$-CLT rates for $p \geq 2$. As an application, the study derives the first optimal $W_1$-CLT rate for multivariate U-statistics under dependence, substantially improving upon existing theoretical upper bounds for Wasserstein CLT rates in dependent settings.
📝 Abstract
Finite-time central limit theorem (CLT) rates play a central role in modern machine learning. In this paper, we study CLT rates for multivariate dependent data in Wasserstein-$p$ ($W_p$) distance, for general $p \geq 1$. We focus on two fundamental dependence structures that commonly arise in machine learning: locally dependent sequences and geometrically ergodic Markov chains. In both settings, we establish the first optimal $O(n^{-1/2})$ rate in $W_1$, as well as the first $W_p$ ($p\ge 2$) CLT rates under mild moment assumptions, substantially improving the best previously known bounds in these dependent-data regimes. As an application of our optimal $W_1$ rate for locally dependent sequences, we further obtain the first optimal $W_1$-CLT rate for multivariate $U$-statistics. On the technical side, we derive a tractable auxiliary bound for $W_1$ Gaussian approximation errors that is well suited for studying dependent data. For Markov chains, we further prove that the regeneration time of the split chain associated with a geometrically ergodic chain has a geometric tail without assuming strong aperiodicity or other restrictive conditions. These tools may be of independent interests and enable our optimal $W_1$ rates and underpin our $W_p$ ($p\ge 2$) results.