Quality Diversity Imitation Learning

📅 2024-10-08
🏛️ arXiv.org
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
Traditional imitation learning (IL) is limited to replicating a single expert policy, resulting in insufficient behavioral diversity and poor robustness in real-world settings. To address this, we propose the first general-purpose Quality-Diversity Imitation Learning (QD-IL) framework, which tightly integrates Quality-Diversity (QD) optimization with Adversarial Imitation Learning (AIL), enabling plug-and-play enhancement of any Inverse Reinforcement Learning (IRL) method. Our framework establishes an end-to-end training paradigm in MuJoCo continuous-control environments without requiring additional expert annotations. Experiments demonstrate that QD-IL significantly improves both skill diversity and policy quality over baseline methods such as GAIL and VAIL. Notably, on the Humanoid task, it achieves a composite performance twice that of the expert policy. This work introduces a novel paradigm for few-shot, diverse, and high-quality skill co-learning.

Technology Category

Application Category

📝 Abstract
Imitation learning (IL) has shown great potential in various applications, such as robot control. However, traditional IL methods are usually designed to learn only one specific type of behavior since demonstrations typically correspond to a single expert. In this work, we introduce the first generic framework for Quality Diversity Imitation Learning (QD-IL), which enables the agent to learn a broad range of skills from limited demonstrations. Our framework integrates the principles of quality diversity with adversarial imitation learning (AIL) methods, and can potentially improve any inverse reinforcement learning (IRL) method. Empirically, our framework significantly improves the QD performance of GAIL and VAIL on the challenging continuous control tasks derived from Mujoco environments. Moreover, our method even achieves 2x expert performance in the most challenging Humanoid environment.
Problem

Research questions and friction points this paper is trying to address.

Enhancing robot locomotion diversity via extrinsic behavioral curiosity
Overcoming single-policy limitations in imitation learning for robustness
Integrating quality-diversity optimization with IRL for diverse behaviors
Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates quality-diversity optimization with IRL
Uses Extrinsic Behavioral Curiosity for novelty rewards
Applies to Gradient-Arborescence-based QD-RL algorithms
🔎 Similar Papers
No similar papers found.
Z
Zhenglin Wan
Centre for Frontier AI Research, Agency for Science, Technology and Research, Singapore; School of Data Science, The Chinese University of Hong Kong, Shenzhen, Shenzhen, China
Xingrui Yu
Xingrui Yu
Scientist, CFAR, A*STAR
Machine LearningRobust Imitation LearningTrustworthy AI
D
David M. Bossens
Centre for Frontier AI Research, Agency for Science, Technology and Research, Singapore
Y
Yueming Lyu
Centre for Frontier AI Research, Agency for Science, Technology and Research, Singapore
Q
Qing Guo
Centre for Frontier AI Research, Agency for Science, Technology and Research, Singapore
Flint Xiaofeng Fan
Flint Xiaofeng Fan
ETH Zurich, NUS, A*STAR International Fellow
Federated RLMulti-agent systemsDistributed ComputingOptimization
I
Ivor W. Tsang
Centre for Frontier AI Research, Agency for Science, Technology and Research, Singapore