🤖 AI Summary
In offline reinforcement learning, policy drift toward out-of-distribution actions—caused by extrapolation error—remains a critical challenge. To address this, we propose a novel constraint mechanism inspired by static friction in classical mechanics: action-space deviations are modeled as requiring overcoming a “friction threshold,” explicitly restricting both the direction and magnitude of policy updates while preserving the simplicity of batch-constrained learning. Technically, our approach integrates orthogonal action manifold distance metrics with behavior cloning regularization, yielding a physically interpretable and computationally efficient constraint framework. Evaluated on multiple continuous-control benchmark tasks, our method significantly improves training stability and policy robustness, consistently outperforming state-of-the-art batch-constrained algorithms. These results empirically validate that incorporating physical priors—specifically, friction-inspired constraints—effectively mitigates extrapolation error in offline RL.
📝 Abstract
We draw an analogy between static friction in classical mechanics and extrapolation error in off-policy RL, and use it to formulate a constraint that prevents the policy from drifting toward unsupported actions. In this study, we present Frictional Q-learning, a deep reinforcement learning algorithm for continuous control, which extends batch-constrained reinforcement learning. Our algorithm constrains the agent's action space to encourage behavior similar to that in the replay buffer, while maintaining a distance from the manifold of the orthonormal action space. The constraint preserves the simplicity of batch-constrained, and provides an intuitive physical interpretation of extrapolation error. Empirically, we further demonstrate that our algorithm is robustly trained and achieves competitive performance across standard continuous control benchmarks.