π€ AI Summary
This paper addresses statistical inference for the averaged estimator in temporal-difference (TD) learning under a Markov chain setting, establishing the first non-asymptotic central limit theorem (CLT). Methodologically, it pioneers the integration of Steinβs method with the Poisson equation to derive a non-asymptotic CLT for vector-valued martingale difference sequences, extended to functionals of ergodic Markov chains. Key contributions include: (1) an $O(1/sqrt{n})$ convergence rate for the TD averaged estimator with explicit, non-asymptotic error bounds; (2) the first non-asymptotic characterization of normality for TD estimators; and (3) a rigorous statistical foundation for constructing confidence intervals and conducting hypothesis tests in reinforcement learning. By unifying stochastic approximation, Markov ergodic theory, and Steinβs method, the work significantly advances the interpretability and reliability analysis of RL algorithms.
π Abstract
We prove a non-asymptotic central limit theorem for vector-valued martingale differences using Stein's method, and use Poisson's equation to extend the result to functions of Markov Chains. We then show that these results can be applied to establish a non-asymptotic central limit theorem for Temporal Difference (TD) learning with averaging.