🤖 AI Summary
Traditional imitation learning (IL) is limited to replicating a single expert policy, resulting in insufficient behavioral diversity and poor robustness in real-world settings. To address this, we propose the first general-purpose Quality-Diversity Imitation Learning (QD-IL) framework, which tightly integrates Quality-Diversity (QD) optimization with Adversarial Imitation Learning (AIL), enabling plug-and-play enhancement of any Inverse Reinforcement Learning (IRL) method. Our framework establishes an end-to-end training paradigm in MuJoCo continuous-control environments without requiring additional expert annotations. Experiments demonstrate that QD-IL significantly improves both skill diversity and policy quality over baseline methods such as GAIL and VAIL. Notably, on the Humanoid task, it achieves a composite performance twice that of the expert policy. This work introduces a novel paradigm for few-shot, diverse, and high-quality skill co-learning.
📝 Abstract
Imitation learning (IL) has shown great potential in various applications, such as robot control. However, traditional IL methods are usually designed to learn only one specific type of behavior since demonstrations typically correspond to a single expert. In this work, we introduce the first generic framework for Quality Diversity Imitation Learning (QD-IL), which enables the agent to learn a broad range of skills from limited demonstrations. Our framework integrates the principles of quality diversity with adversarial imitation learning (AIL) methods, and can potentially improve any inverse reinforcement learning (IRL) method. Empirically, our framework significantly improves the QD performance of GAIL and VAIL on the challenging continuous control tasks derived from Mujoco environments. Moreover, our method even achieves 2x expert performance in the most challenging Humanoid environment.