FLAG: Flow Policy MaxEnt-RL by Latent Augmented Guidance

📅 2026-05-28

📈 Citations: 0

✨ Influential: 0

career value

175K/year

🤖 AI Summary

This work addresses the challenge of weight collapse and poor scalability in maximum entropy reinforcement learning within high-dimensional action spaces, which arises from importance sampling over the global action space. To mitigate this issue, the authors propose a localized sampling strategy grounded in latent variables defined over the state-space manifold. By introducing a Latent-Augmented Guidance mechanism, they construct a provably consistent surrogate objective that effectively circumvents the degeneracy of importance weights. The approach synergistically integrates flow models, latent variable augmentation, and importance-weighted supervised learning, substantially enhancing policy expressiveness and sample efficiency. Empirical evaluations across multiple high-dimensional control benchmarks demonstrate state-of-the-art performance, underscoring the method’s effectiveness in complex decision-making environments.

📝 Abstract

Maximum entropy reinforcement learning (MaxEnt-RL) enables robust exploration, yet practical implementations often restrict policies to simple Gaussians. While recent approaches incorporate expressive generative policies via importance-weighted supervised learning, they are prone to importance weight collapse, which limits their scalability in high-dimensional action spaces. Our key insight is to mitigate this limitation by localizing the sampling region, avoiding the weight degeneracy induced by importance sampling over the entire action space. To instantiate this insight, we introduce \textbf{FLAG} (\textbf{F}low policy with \textbf{L}atent-\textbf{A}ugmented \textbf{G}uidance). FLAG augments the state space with a flow latent variable and optimizes a provably consistent proxy MaxEnt-RL objective. We empirically demonstrate that FLAG enables expressive policy optimization with limited importance samples and scales to high-dimensional control tasks. Furthermore, FLAG achieves state-of-the-art performance across challenging benchmarks. Our project webpage: https://flag-rl.github.io/

Problem

Research questions and friction points this paper is trying to address.

Maximum entropy reinforcement learning

importance weight collapse

expressive generative policies

high-dimensional action spaces

Innovation

Methods, ideas, or system contributions that make the work stand out.

Maximum Entropy Reinforcement Learning

Flow-based Policy

Latent-Augmented Guidance

Importance Weight Collapse