🤖 AI Summary
Meta-reinforcement learning (meta-RL) suffers from poor out-of-distribution (OOD) generalization, primarily due to task representations being sensitive to distributional shifts. To address this, we propose a task-aware virtual training framework comprising three key components: (1) a transferable task embedding space built via metric learning; (2) a task-aware virtual task generation mechanism that explicitly enforces representational consistency between training and OOD tasks; and (3) state regularization to mitigate value overestimation under dynamic state distributions. This is the first work to jointly integrate task-aware virtual task construction with state regularization in meta-RL. Evaluated on MuJoCo and MetaWorld benchmarks, our framework achieves an average 23.6% improvement in OOD task performance and a 41% increase in generalization stability, significantly outperforming existing meta-RL methods.
📝 Abstract
Meta reinforcement learning aims to develop policies that generalize to unseen tasks sampled from a task distribution. While context-based meta-RL methods improve task representation using task latents, they often struggle with out-of-distribution (OOD) tasks. To address this, we propose Task-Aware Virtual Training (TAVT), a novel algorithm that accurately captures task characteristics for both training and OOD scenarios using metric-based representation learning. Our method successfully preserves task characteristics in virtual tasks and employs a state regularization technique to mitigate overestimation errors in state-varying environments. Numerical results demonstrate that TAVT significantly enhances generalization to OOD tasks across various MuJoCo and MetaWorld environments.