FlowQ: Energy-Guided Flow Policies for Offline Reinforcement Learning

📅 2025-05-20

📈 Citations: 0

✨ Influential: 0

career value

221K/year

🤖 AI Summary

This work addresses the challenge of modeling high-dimensional, multimodal action distributions in offline reinforcement learning. We propose Energy-Guided Flow Matching (EGFM), a novel framework that models the target policy as a conditional probability flow path jointly defined by the offline dataset distribution and a learnable energy function. By employing Gaussian path approximation and learning a conditional velocity field, EGFM internalizes energy guidance directly into training—eliminating the need for auxiliary guidance signals during inference. To our knowledge, this is the first flow-based policy method to decouple training time from sampling steps. EGFM achieves state-of-the-art or competitive performance on standard offline RL benchmarks, significantly improves training efficiency, and naturally supports multimodal action generation without requiring guidance computation at inference time.

Technology Category

Application Category

📝 Abstract

The use of guidance to steer sampling toward desired outcomes has been widely explored within diffusion models, especially in applications such as image and trajectory generation. However, incorporating guidance during training remains relatively underexplored. In this work, we introduce energy-guided flow matching, a novel approach that enhances the training of flow models and eliminates the need for guidance at inference time. We learn a conditional velocity field corresponding to the flow policy by approximating an energy-guided probability path as a Gaussian path. Learning guided trajectories is appealing for tasks where the target distribution is defined by a combination of data and an energy function, as in reinforcement learning. Diffusion-based policies have recently attracted attention for their expressive power and ability to capture multi-modal action distributions. Typically, these policies are optimized using weighted objectives or by back-propagating gradients through actions sampled by the policy. As an alternative, we propose FlowQ, an offline reinforcement learning algorithm based on energy-guided flow matching. Our method achieves competitive performance while the policy training time is constant in the number of flow sampling steps.

Problem

Research questions and friction points this paper is trying to address.

Enhancing flow model training with energy guidance

Eliminating inference-time guidance in reinforcement learning

Improving policy training efficiency in offline RL

Innovation

Methods, ideas, or system contributions that make the work stand out.

Energy-guided flow matching for training

Gaussian path approximates probability path

FlowQ algorithm for offline reinforcement learning

🔎 Similar Papers

State-Constrained Offline Reinforcement Learning