Frictional Q-Learning

📅 2025-09-24

📈 Citations: 0

✨ Influential: 0

career value

162K/year

🤖 AI Summary

In offline reinforcement learning, policy drift toward out-of-distribution actions—caused by extrapolation error—remains a critical challenge. To address this, we propose a novel constraint mechanism inspired by static friction in classical mechanics: action-space deviations are modeled as requiring overcoming a “friction threshold,” explicitly restricting both the direction and magnitude of policy updates while preserving the simplicity of batch-constrained learning. Technically, our approach integrates orthogonal action manifold distance metrics with behavior cloning regularization, yielding a physically interpretable and computationally efficient constraint framework. Evaluated on multiple continuous-control benchmark tasks, our method significantly improves training stability and policy robustness, consistently outperforming state-of-the-art batch-constrained algorithms. These results empirically validate that incorporating physical priors—specifically, friction-inspired constraints—effectively mitigates extrapolation error in offline RL.

Technology Category

Application Category

📝 Abstract

We draw an analogy between static friction in classical mechanics and extrapolation error in off-policy RL, and use it to formulate a constraint that prevents the policy from drifting toward unsupported actions. In this study, we present Frictional Q-learning, a deep reinforcement learning algorithm for continuous control, which extends batch-constrained reinforcement learning. Our algorithm constrains the agent's action space to encourage behavior similar to that in the replay buffer, while maintaining a distance from the manifold of the orthonormal action space. The constraint preserves the simplicity of batch-constrained, and provides an intuitive physical interpretation of extrapolation error. Empirically, we further demonstrate that our algorithm is robustly trained and achieves competitive performance across standard continuous control benchmarks.

Problem

Research questions and friction points this paper is trying to address.

Prevents policy drift toward unsupported actions in off-policy RL

Constrains action space to encourage replay buffer-like behavior

Addresses extrapolation error in continuous control reinforcement learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Analogy between static friction and extrapolation error

Constrains action space to replay buffer behavior

Maintains distance from orthonormal action manifold

🔎 Similar Papers

Iterated $Q$-Network: Beyond One-Step Bellman Updates in Deep Reinforcement Learning