Quality Diversity Imitation Learning

📅 2024-10-08

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

🤖 AI Summary

Traditional imitation learning (IL) is limited to replicating a single expert policy, resulting in insufficient behavioral diversity and poor robustness in real-world settings. To address this, we propose the first general-purpose Quality-Diversity Imitation Learning (QD-IL) framework, which tightly integrates Quality-Diversity (QD) optimization with Adversarial Imitation Learning (AIL), enabling plug-and-play enhancement of any Inverse Reinforcement Learning (IRL) method. Our framework establishes an end-to-end training paradigm in MuJoCo continuous-control environments without requiring additional expert annotations. Experiments demonstrate that QD-IL significantly improves both skill diversity and policy quality over baseline methods such as GAIL and VAIL. Notably, on the Humanoid task, it achieves a composite performance twice that of the expert policy. This work introduces a novel paradigm for few-shot, diverse, and high-quality skill co-learning.

Technology Category

Application Category

📝 Abstract

Imitation learning (IL) has shown great potential in various applications, such as robot control. However, traditional IL methods are usually designed to learn only one specific type of behavior since demonstrations typically correspond to a single expert. In this work, we introduce the first generic framework for Quality Diversity Imitation Learning (QD-IL), which enables the agent to learn a broad range of skills from limited demonstrations. Our framework integrates the principles of quality diversity with adversarial imitation learning (AIL) methods, and can potentially improve any inverse reinforcement learning (IRL) method. Empirically, our framework significantly improves the QD performance of GAIL and VAIL on the challenging continuous control tasks derived from Mujoco environments. Moreover, our method even achieves 2x expert performance in the most challenging Humanoid environment.

Problem

Research questions and friction points this paper is trying to address.

Enhancing robot locomotion diversity via extrinsic behavioral curiosity

Overcoming single-policy limitations in imitation learning for robustness

Integrating quality-diversity optimization with IRL for diverse behaviors

Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates quality-diversity optimization with IRL

Uses Extrinsic Behavioral Curiosity for novelty rewards

Applies to Gradient-Arborescence-based QD-RL algorithms

🔎 Similar Papers

No similar papers found.

Authors to Follow