Flow Actor-Critic for Offline Reinforcement Learning

📅 2026-02-20

📈 Citations: 1

✨ Influential: 0

career value

218K/year

🤖 AI Summary

This work addresses the challenge in offline reinforcement learning that multimodal data distributions cannot be effectively modeled by conventional Gaussian policies. To this end, the authors propose Flow Actor-Critic (FAC), a novel method that, for the first time, integrates normalizing flows into both the policy network and the conservative critic design. Specifically, the approach leverages flow-based models to construct a highly expressive policy capable of accurately capturing complex behavioral distributions. Furthermore, it introduces a flow-based behavioral proxy model to derive a new regularizer for the critic, thereby enhancing the conservatism of value estimation. The proposed method achieves state-of-the-art performance on established offline reinforcement learning benchmarks, including D4RL and OGBench.

Technology Category

Application Category

📝 Abstract

The dataset distributions in offline reinforcement learning (RL) often exhibit complex and multi-modal distributions, necessitating expressive policies to capture such distributions beyond widely-used Gaussian policies. To handle such complex and multi-modal datasets, in this paper, we propose Flow Actor-Critic, a new actor-critic method for offline RL, based on recent flow policies. The proposed method not only uses the flow model for actor as in previous flow policies but also exploits the expressive flow model for conservative critic acquisition to prevent Q-value explosion in out-of-data regions. To this end, we propose a new form of critic regularizer based on the flow behavior proxy model obtained as a byproduct of flow-based actor design. Leveraging the flow model in this joint way, we achieve new state-of-the-art performance for test datasets of offline RL including the D4RL and recent OGBench benchmarks.

Problem

Research questions and friction points this paper is trying to address.

offline reinforcement learning

multi-modal distributions

expressive policies

dataset distribution

policy representation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Flow-based policy

Offline reinforcement learning

Actor-critic