Rates of Convergence in the Central Limit Theorem for Markov Chains, with an Application to TD Learning

πŸ“… 2024-01-28
πŸ›οΈ arXiv.org
πŸ“ˆ Citations: 4
✨ Influential: 1
πŸ“„ PDF
πŸ€– AI Summary
This paper addresses statistical inference for the averaged estimator in temporal-difference (TD) learning under a Markov chain setting, establishing the first non-asymptotic central limit theorem (CLT). Methodologically, it pioneers the integration of Stein’s method with the Poisson equation to derive a non-asymptotic CLT for vector-valued martingale difference sequences, extended to functionals of ergodic Markov chains. Key contributions include: (1) an $O(1/sqrt{n})$ convergence rate for the TD averaged estimator with explicit, non-asymptotic error bounds; (2) the first non-asymptotic characterization of normality for TD estimators; and (3) a rigorous statistical foundation for constructing confidence intervals and conducting hypothesis tests in reinforcement learning. By unifying stochastic approximation, Markov ergodic theory, and Stein’s method, the work significantly advances the interpretability and reliability analysis of RL algorithms.

Technology Category

Application Category

πŸ“ Abstract
We prove a non-asymptotic central limit theorem for vector-valued martingale differences using Stein's method, and use Poisson's equation to extend the result to functions of Markov Chains. We then show that these results can be applied to establish a non-asymptotic central limit theorem for Temporal Difference (TD) learning with averaging.
Problem

Research questions and friction points this paper is trying to address.

Non-asymptotic central limit theorem
Vector-valued martingale differences
Temporal Difference (TD) learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Stein's method for martingale differences
Poisson's equation for Markov Chains
Non-asymptotic CLT for TD learning
πŸ”Ž Similar Papers
No similar papers found.