COLLIE: Guiding Skill Discovery in Semantically Coherent Latent Space

📅 2026-05-30
📈 Citations: 0
Influential: 0
📄 PDF

career value

209K/year
🤖 AI Summary
This work addresses the challenge that unsupervised skill discovery often yields task-irrelevant or hazardous behaviors, while existing guidance approaches rely on additional model training or expert demonstrations and struggle under sparse online human feedback. The authors propose a training-agnostic guidance mechanism that constructs a semantically coherent latent skill space by leveraging dense unsupervised data to learn structured representations. This approach enables effective skill learning with only minimal online human feedback and eliminates the need for training auxiliary guidance models. Empirical results demonstrate that the method successfully acquires diverse, human-aligned, and safe skills across both state-based and pixel-based environments, achieving superior downstream task performance with remarkably few feedback signals.
📝 Abstract
Unsupervised skill discovery (USD) aims to learn diverse behaviors without reward functions, but often results in task-irrelevant or hazardous behaviors due to uniform exploration. Guided skill discovery (GSD) addresses this issue by incorporating human intent to focus exploration on meaningful regions. However, existing GSD methods typically require training additional guidance models, and rely on pre-defined rules or expert demonstration, which can be ineffective under sparse, online-collected human feedback. To overcome this, we propose COLLIE, a GSD framework that leverages dense unsupervised data to construct a semantically coherent skill latent space. This latent space is well-structured, enabling reliable guidance with sparse online feedback. Moreover, its semantic coherence property enables training-free construction of guidance signals, eliminating the need for additional model training beyond skill learning. Theoretical analysis justifies the effectiveness of our training-free guidance signal, while experiments across diverse state-based and pixel-based tasks show that COLLIE learns diverse, human-aligned skills, avoids hazardous behaviors, and achieves superior downstream performance with minimal human feedback.
Problem

Research questions and friction points this paper is trying to address.

guided skill discovery
unsupervised skill discovery
human feedback
latent space
semantic coherence
Innovation

Methods, ideas, or system contributions that make the work stand out.

guided skill discovery
semantically coherent latent space
unsupervised skill discovery
training-free guidance
sparse human feedback
🔎 Similar Papers
2024-06-07arXiv.orgCitations: 3
Y
Yao Luan
The Center for Intelligent and Networked Systems (CFINS), Department of Automation, Beijing National Research Center for Information Science and Technology, Tsinghua University, Beijing, China
N
Ni Mu
The Center for Intelligent and Networked Systems (CFINS), Department of Automation, Beijing National Research Center for Information Science and Technology, Tsinghua University, Beijing, China
H
Hanfei Ge
Kuang Yaming College, Nanjing University, Nanjing, China
Yiqin Yang
Yiqin Yang
Assistant Professor,Institue of Automation,Chinese Academy of Sciences
Reinforcement LearningEmbodied Intelligence
B
Bo Xu
The Key Laboratory of Cognition and Decision Intelligence for Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, China
Q
Qing-Shan Jia
The Center for Intelligent and Networked Systems (CFINS), Department of Automation, Beijing National Research Center for Information Science and Technology, Tsinghua University, Beijing, China; Beijing Key Laboratory of Embodied Intelligence Systems, Beijing, China; Institute for Embodied Intelligence and Robotics, Tsinghua University, Beijing, China