Reflect-then-Plan: Offline Model-Based Planning through a Doubly Bayesian Lens

📅 2025-06-06

📈 Citations: 0

✨ Influential: 0

career value

190K/year

🤖 AI Summary

Offline reinforcement learning is critical for high-cost or safety-critical exploration scenarios, yet it suffers from severe epistemic uncertainty due to limited offline data, and existing conservative methods exhibit weak generalization. This paper proposes RefPlan, a novel framework that reformulates model-based planning as Bayesian posterior inference—introducing a “dual Bayesian” perspective: (i) modeling environment dynamics uncertainty via Bayesian neural networks, and (ii) explicitly leveraging this uncertainty in planning through posterior sampling and marginalization. During deployment, RefPlan dynamically updates the dynamics belief in real time by fusing incoming observations, enabling adaptive online reflection and robust planning. On standard benchmarks, RefPlan significantly outperforms state-of-the-art conservative methods, demonstrating superior robustness and generalization under high epistemic uncertainty, few-shot settings, and non-stationary environment dynamics.

Technology Category

Application Category

📝 Abstract

Offline reinforcement learning (RL) is crucial when online exploration is costly or unsafe but often struggles with high epistemic uncertainty due to limited data. Existing methods rely on fixed conservative policies, restricting adaptivity and generalization. To address this, we propose Reflect-then-Plan (RefPlan), a novel doubly Bayesian offline model-based (MB) planning approach. RefPlan unifies uncertainty modeling and MB planning by recasting planning as Bayesian posterior estimation. At deployment, it updates a belief over environment dynamics using real-time observations, incorporating uncertainty into MB planning via marginalization. Empirical results on standard benchmarks show that RefPlan significantly improves the performance of conservative offline RL policies. In particular, RefPlan maintains robust performance under high epistemic uncertainty and limited data, while demonstrating resilience to changing environment dynamics, improving the flexibility, generalizability, and robustness of offline-learned policies.

Problem

Research questions and friction points this paper is trying to address.

Addresses high epistemic uncertainty in offline RL

Overcomes limitations of fixed conservative policies

Enhances adaptivity to changing environment dynamics

Innovation

Methods, ideas, or system contributions that make the work stand out.

Doubly Bayesian offline model-based planning

Unifies uncertainty modeling and planning

Updates belief using real-time observations

🔎 Similar Papers

Long-Horizon Planning for Multi-Agent Robots in Partially Observable Environments