FLAG: Flow Policy MaxEnt-RL by Latent Augmented Guidance

πŸ“… 2026-05-28
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

175K/year
πŸ€– AI Summary
This work addresses the challenge of weight collapse and poor scalability in maximum entropy reinforcement learning within high-dimensional action spaces, which arises from importance sampling over the global action space. To mitigate this issue, the authors propose a localized sampling strategy grounded in latent variables defined over the state-space manifold. By introducing a Latent-Augmented Guidance mechanism, they construct a provably consistent surrogate objective that effectively circumvents the degeneracy of importance weights. The approach synergistically integrates flow models, latent variable augmentation, and importance-weighted supervised learning, substantially enhancing policy expressiveness and sample efficiency. Empirical evaluations across multiple high-dimensional control benchmarks demonstrate state-of-the-art performance, underscoring the method’s effectiveness in complex decision-making environments.
πŸ“ Abstract
Maximum entropy reinforcement learning (MaxEnt-RL) enables robust exploration, yet practical implementations often restrict policies to simple Gaussians. While recent approaches incorporate expressive generative policies via importance-weighted supervised learning, they are prone to importance weight collapse, which limits their scalability in high-dimensional action spaces. Our key insight is to mitigate this limitation by localizing the sampling region, avoiding the weight degeneracy induced by importance sampling over the entire action space. To instantiate this insight, we introduce \textbf{FLAG} (\textbf{F}low policy with \textbf{L}atent-\textbf{A}ugmented \textbf{G}uidance). FLAG augments the state space with a flow latent variable and optimizes a provably consistent proxy MaxEnt-RL objective. We empirically demonstrate that FLAG enables expressive policy optimization with limited importance samples and scales to high-dimensional control tasks. Furthermore, FLAG achieves state-of-the-art performance across challenging benchmarks. Our project webpage: https://flag-rl.github.io/
Problem

Research questions and friction points this paper is trying to address.

Maximum entropy reinforcement learning
importance weight collapse
expressive generative policies
high-dimensional action spaces
Innovation

Methods, ideas, or system contributions that make the work stand out.

Maximum Entropy Reinforcement Learning
Flow-based Policy
Latent-Augmented Guidance
Importance Weight Collapse
High-Dimensional Control