Teacher-Student Representational Alignment for Reinforcement Learning-Driven Imitation Learning

📅 2026-05-27

📈 Citations: 0

✨ Influential: 0

career value

193K/year

🤖 AI Summary

This work addresses the fundamental challenge in imitation learning where teacher policies often rely on privileged information inaccessible to the student, leading to an irreducible performance gap. To mitigate this issue, the authors propose a joint training framework that simultaneously learns a teacher policy and constructs a shared embedding space between teacher and student. A gradient masking mechanism is introduced to prevent leakage of private information into the shared representation. By integrating self-supervised contrastive learning, the method jointly optimizes the shared representation and the teacher policy, yielding inherently imitable teacher behaviors without requiring post-hoc reinforcement learning fine-tuning. Experimental results demonstrate that the proposed approach significantly narrows the imitation gap across multiple tasks, with student policies consistently outperforming current state-of-the-art baselines.

📝 Abstract

Imitation learning (IL) from a state-based reinforcement learning (RL) policy is a common approach to overcome the curse of dimensionality in complex and high-dimensional observation spaces prevalent in robotics. This paper addresses the irreducible imitation gap that emerges when teacher and student are learned in isolation, and the teacher policy has the liberty to rely on privileged state information that the student cannot infer from its observations. Instead of improving poor student performance with RL finetuning after IL, which often requires a whole new training setup, we propose a novel algorithm which learns a shared embedding space that hides agent-specific observations and thus trains imitable teacher policies by construction. We train the shared embedding space with self-supervised contrastive learning in parallel to the teacher policy and prevent it from extracting private information by limiting its gradients from updating the encoder networks. We perform evaluations on several example domains and compare to state-of-the-art baselines showing that our algorithm enables higher student performance with substantially reduced imitation gap.

Problem

Research questions and friction points this paper is trying to address.

Imitation Learning

Reinforcement Learning

Representation Alignment

Observation Mismatch

Privileged Information

Innovation

Methods, ideas, or system contributions that make the work stand out.

Representational Alignment

Imitation Learning

Reinforcement Learning