🤖 AI Summary
To address unstable skill learning caused by noisy offline demonstrations in long-horizon meta-reinforcement learning, this paper proposes a robust online-offline collaborative skill learning framework. Methodologically, it (1) introduces a confidence-based trajectory prioritization and refinement mechanism to dynamically select high-quality samples; (2) integrates clean trajectories generated via online exploration with noisy offline demonstrations for hybrid data distillation; and (3) decouples meta-policy optimization from skill representation learning to enhance skill reusability. Evaluated on multiple long-horizon benchmark tasks, the approach significantly improves adaptation speed and final performance. It achieves a 42% increase in noise tolerance and a 3.1× improvement in skill reuse rate, consistently outperforming existing state-of-the-art methods.
📝 Abstract
Meta-reinforcement learning (Meta-RL) facilitates rapid adaptation to unseen tasks but faces challenges in long-horizon environments. Skill-based approaches tackle this by decomposing state-action sequences into reusable skills and employing hierarchical decision-making. However, these methods are highly susceptible to noisy offline demonstrations, resulting in unstable skill learning and degraded performance. To overcome this, we propose Prioritized Refinement for Skill-Based Meta-RL (PRISM), a robust framework that integrates exploration near noisy data to generate online trajectories and combines them with offline data. Through prioritization, PRISM extracts high-quality data to learn task-relevant skills effectively. By addressing the impact of noise, our method ensures stable skill learning and achieves superior performance in long-horizon tasks, even with noisy and sub-optimal data.