Efficient and Uncertainty-Aware Diffusion Framework for Offline-to-Online Reinforcement Learning

πŸ“… 2026-05-28
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

186K/year
πŸ€– AI Summary
This work addresses the performance degradation in offline-to-online reinforcement learning caused by distributional shift and proposes DUAL, a novel framework that uniquely integrates prior knowledge from diffusion models with an uncertainty-aware mechanism. During the offline phase, DUAL distills an efficient action policy and transition model using a diffusion model; in the online phase, it balances exploration and exploitation by combining Laplace approximation with state-transition distance–based uncertainty estimation. Empirical results demonstrate that DUAL consistently achieves significantly higher expected returns across diverse environments and settings, outperforming existing offline-to-online reinforcement learning baselines.
πŸ“ Abstract
Offline-to-Online Reinforcement Learning (O2O-RL) leverages an offline, pre-trained policy to minimize costly online interactions. Although data-efficient, O2O-RL is susceptible to shifts between offline and online distributions. Existing work aims to mitigate the harm of this shift by finetuning the policy on trajectory data sampled from a diffusion model. Inspired by this line of work, we propose DUAL: an efficient \textbf{D}iffusion \textbf{U}ncertainty-\textbf{A}ware framework for offline-to-online reinforcement \textbf{L}earning. DUAL utilizes the prior knowledge of the diffusion model to distill a fast-sampling diffusion actor policy and transition model in the offline phase. DUAL also employs a Laplace approximation and distance transition-state-shift detection, thereby using uncertainty quantification to improve exploration versus exploitation in the online phase. We formally show that our actor loss with the Laplace approximation provides a proxy for a principled estimate of epistemic uncertainty. Empirically, DUAL improves the online expected return over O2O-RL baselines across multiple settings and environments.
Problem

Research questions and friction points this paper is trying to address.

Offline-to-Online Reinforcement Learning
distribution shift
epistemic uncertainty
data efficiency
reinforcement learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Diffusion Model
Uncertainty Quantification
Offline-to-Online RL
Laplace Approximation
Epistemic Uncertainty
πŸ”Ž Similar Papers
No similar papers found.