SkillMimic-V2: Learning Robust and Generalizable Interaction Skills from Sparse and Noisy Demonstrations

📅 2025-05-04

📈 Citations: 0

✨ Influential: 0

career value

222K/year

🤖 AI Summary

This paper addresses the challenge of insufficient skill robustness and generalization in reinforcement learning from interactive demonstrations (RLID), primarily caused by sparse and noisy demonstrations. To tackle this, we propose a novel method grounded in physically plausible implicit trajectory modeling. Our key contributions are: (1) a dual-enhanced representation comprising a Stitched Trajectory Graph and a State Transition Field, explicitly encoding continuous skill transitions between states; and (2) an Adaptive Trajectory Sampling curriculum strategy coupled with a history-state-encoded memory framework, enhancing noise resilience and long-horizon dependency modeling. Evaluated across diverse interactive tasks, our approach significantly improves training convergence stability, cross-scenario generalization, and perturbation recovery robustness—consistently outperforming state-of-the-art methods.

Technology Category

Application Category

📝 Abstract

We address a fundamental challenge in Reinforcement Learning from Interaction Demonstration (RLID): demonstration noise and coverage limitations. While existing data collection approaches provide valuable interaction demonstrations, they often yield sparse, disconnected, and noisy trajectories that fail to capture the full spectrum of possible skill variations and transitions. Our key insight is that despite noisy and sparse demonstrations, there exist infinite physically feasible trajectories that naturally bridge between demonstrated skills or emerge from their neighboring states, forming a continuous space of possible skill variations and transitions. Building upon this insight, we present two data augmentation techniques: a Stitched Trajectory Graph (STG) that discovers potential transitions between demonstration skills, and a State Transition Field (STF) that establishes unique connections for arbitrary states within the demonstration neighborhood. To enable effective RLID with augmented data, we develop an Adaptive Trajectory Sampling (ATS) strategy for dynamic curriculum generation and a historical encoding mechanism for memory-dependent skill learning. Our approach enables robust skill acquisition that significantly generalizes beyond the reference demonstrations. Extensive experiments across diverse interaction tasks demonstrate substantial improvements over state-of-the-art methods in terms of convergence stability, generalization capability, and recovery robustness.

Problem

Research questions and friction points this paper is trying to address.

Addressing sparse and noisy interaction demonstrations in RLID

Discovering feasible skill transitions despite demonstration limitations

Enhancing generalization and robustness in skill acquisition

Innovation

Methods, ideas, or system contributions that make the work stand out.

Stitched Trajectory Graph discovers potential skill transitions

State Transition Field connects arbitrary neighboring states

Adaptive Trajectory Sampling enables dynamic curriculum generation

🔎 Similar Papers

SkillMimic: Learning Basketball Interaction Skills from Demonstrations