Fast and Highly Expressive Policy Learning for Offline Reinforcement Learning via Bootstrapped Flow Q-Learning

📅 2026-06-09

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the high computational cost and training instability of diffusion-based Q-learning in offline reinforcement learning, which arise from multi-step denoising procedures. To overcome these limitations, the authors propose Bootstrapped Flow Q-Learning (BFQ), a novel framework grounded in flow matching. BFQ employs a divide-and-conquer strategy to first estimate short-horizon displacement vectors and then leverages these estimates to guide the direct learning of a single-step mapping from noise to action. Notably, this approach requires neither auxiliary networks, policy distillation, nor multi-stage training. BFQ achieves efficient and stable one-step action generation while preserving high representational capacity, significantly outperforming multi-step diffusion baselines on the D4RL benchmark and offering a favorable balance among performance, simplicity, and computational efficiency.

📝 Abstract

Diffusion-based Q-learning has emerged as a powerful paradigm for offline reinforcement learning, but its reliance on multi-step denoising makes both training and inference computationally expensive and brittle. Recent efforts to accelerate diffusion Q-learning toward single-step action generation typically introduce auxiliary networks, policy distillation, or multi-phase training, which frequently compromise simplicity, stability, or performance. To address these limitations, we introduce Bootstrapped Flow Q-Learning (BFQ), a novel framework that enables accurate single-step action generation during both training and inference, without auxiliary networks or distillation procedures. BFQ adopts a divide-and-conquer view of the displacement vector along the flow path: it begins by learning short-range displacements that can be accurately estimated from the Flow Matching marginal velocity, and bootstraps these components to directly learn a noise-to-action mapping in a single step. This formulation eliminates multi-step denoising, resulting in a learning procedure that is substantially faster, simpler, and more robust. Extensive D4RL evaluations show that BFQ improves performance while significantly reducing computational cost compared to multi-step diffusion baselines, demonstrating that single-step action generation suffices for high-performance offline Reinforcement Learning.

Problem

Research questions and friction points this paper is trying to address.

offline reinforcement learning

diffusion-based Q-learning

single-step action generation

computational efficiency

training stability

Innovation

Methods, ideas, or system contributions that make the work stand out.

Bootstrapped Flow Q-Learning

single-step action generation

offline reinforcement learning