Rates of Convergence in the Central Limit Theorem for Markov Chains, with an Application to TD Learning

📅 2024-01-28

🏛️ arXiv.org

📈 Citations: 4

✨ Influential: 1

🤖 AI Summary

This paper addresses statistical inference for the averaged estimator in temporal-difference (TD) learning under a Markov chain setting, establishing the first non-asymptotic central limit theorem (CLT). Methodologically, it pioneers the integration of Stein’s method with the Poisson equation to derive a non-asymptotic CLT for vector-valued martingale difference sequences, extended to functionals of ergodic Markov chains. Key contributions include: (1) an $O(1/sqrt{n})$ convergence rate for the TD averaged estimator with explicit, non-asymptotic error bounds; (2) the first non-asymptotic characterization of normality for TD estimators; and (3) a rigorous statistical foundation for constructing confidence intervals and conducting hypothesis tests in reinforcement learning. By unifying stochastic approximation, Markov ergodic theory, and Stein’s method, the work significantly advances the interpretability and reliability analysis of RL algorithms.

Technology Category

Application Category

📝 Abstract

We prove a non-asymptotic central limit theorem for vector-valued martingale differences using Stein's method, and use Poisson's equation to extend the result to functions of Markov Chains. We then show that these results can be applied to establish a non-asymptotic central limit theorem for Temporal Difference (TD) learning with averaging.

Problem

Research questions and friction points this paper is trying to address.

Non-asymptotic central limit theorem

Vector-valued martingale differences

Temporal Difference (TD) learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Stein's method for martingale differences

Poisson's equation for Markov Chains

Non-asymptotic CLT for TD learning

🔎 Similar Papers

No similar papers found.

Authors to Follow