Variance Reduction via Resampling and Experience Replay

📅 2025-02-01

📈 Citations: 0

✨ Influential: 0

career value

210K/year

🤖 AI Summary

Empirical replay in reinforcement learning lacks statistical foundations, leading to high variance and low efficiency in policy evaluation under small-sample regimes. Method: This paper establishes the first statistical modeling framework for experience replay, rigorously characterizing it as a variance-reduction mechanism grounded in U- and V-statistics. We extend this framework to model-free policy evaluation algorithms—including Least-Squares Temporal Difference (LSTD) and PDE-based methods—and integrate it with kernel ridge regression. Contribution/Results: Theoretical analysis provides strict statistical guarantees on bias–variance trade-offs and convergence. Empirically, our approach significantly improves estimation stability and reduces computational complexity for kernel ridge regression from $O(n^3)$ to $O(n^2)$, while simultaneously lowering variance. Experiments validate the theoretical claims and demonstrate strong cross-task generalization. Our work introduces a new paradigm for small-sample reinforcement learning and nonparametric regression—uniquely combining statistical rigor with computational feasibility.

Technology Category

Application Category

📝 Abstract

Experience replay is a foundational technique in reinforcement learning that enhances learning stability by storing past experiences in a replay buffer and reusing them during training. Despite its practical success, its theoretical properties remain underexplored. In this paper, we present a theoretical framework that models experience replay using resampled $U$- and $V$-statistics, providing rigorous variance reduction guarantees. We apply this framework to policy evaluation tasks using the Least-Squares Temporal Difference (LSTD) algorithm and a Partial Differential Equation (PDE)-based model-free algorithm, demonstrating significant improvements in stability and efficiency, particularly in data-scarce scenarios. Beyond policy evaluation, we extend the framework to kernel ridge regression, showing that the experience replay-based method reduces the computational cost from the traditional $O(n^3)$ in time to as low as $O(n^2)$ in time while simultaneously reducing variance. Extensive numerical experiments validate our theoretical findings, demonstrating the broad applicability and effectiveness of experience replay in diverse machine learning tasks.

Problem

Research questions and friction points this paper is trying to address.

Reinforcement Learning

Machine Learning

Data Efficiency

Innovation

Methods, ideas, or system contributions that make the work stand out.

Resampling U-and V-statistics

Empirical Replay

Kernel Ridge Regression

🔎 Similar Papers

Efficient Policy Evaluation with Safety Constraint for Reinforcement Learning