Drift Q-Learning

📅 2026-05-29

📈 Citations: 0

✨ Influential: 0

career value

161K/year

🤖 AI Summary

Offline reinforcement learning faces the challenge of unreliable value estimation due to out-of-distribution actions. This work proposes a novel approach that integrates behavior regularization with critic-driven policy improvement, leveraging a single network architecture and a unified optimization objective to efficiently generate actions in a single forward pass. The method introduces an attraction-repulsion mechanism that simultaneously constrains the policy to stay within the data-supported region while preventing mode collapse, thereby combining the performance benefits of diffusion and flow models with the simplicity and efficiency of deterministic methods. Evaluated on the D4RL and OGBench benchmarks, the proposed approach consistently outperforms existing diffusion- and flow-based methods and maintains near-optimal performance even when trained on low-quality datasets.

📝 Abstract

Offline reinforcement learning requires improving a policy from fixed data while avoiding out-of-distribution actions with unreliable value estimates. Diffusion and flow policies handle this trade-off by modeling the behavior distribution to regularize the RL objective, but they require iterative denoising, solver integrations, and in more efficient variants, distillation or other approximations at inference. We propose DriftQL, which combines a drift-based behavioral regularizer with critic-driven policy improvement. The value signal biases the policy toward high-value regions of the data support, while attraction and repulsion together keep generated actions near the data and prevent collapse onto a single mode. DriftQL is implemented as a single network with a unified training objective and generates actions in a single forward pass. On D4RL and OGBench, DriftQL consistently outperforms diffusion and flow methods, advancing the state of the art. Under degraded data quality, where the baselines visibly struggle, DriftQL remains close to its clean-data performance, positioning it as a promising alternative to diffusion and flow-based methods while maintaining the simplicity and efficiency of deterministic approaches. Project page: https://driftql.github.io/

Problem

Research questions and friction points this paper is trying to address.

offline reinforcement learning

out-of-distribution actions

behavioral regularization

policy improvement

value estimation

Innovation

Methods, ideas, or system contributions that make the work stand out.

DriftQL

offline reinforcement learning

behavioral regularization