TReF-6: Inferring Task-Relevant Frames from a Single Demonstration for One-Shot Skill Generalization

📅 2025-08-29

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Robots struggle to generalize skills across diverse scenes from a single demonstration, primarily due to the absence of transferable and interpretable task-space representations. To address this, we propose TReF-6, the first method to automatically construct a six-degree-of-freedom Task Reference Frame (TRF) from a single demonstration trajectory. TReF-6 identifies influence points via geometric trajectory analysis to define local coordinate origins, and integrates a vision-language model with Grounded-SAM for semantic grounding and scene-adaptive alignment. This TRF extends Dynamic Movement Primitives (DMPs) beyond conventional start–end point imitation to enable functionally consistent geometric–semantic transfer, preserving task intent during generalization. Experiments demonstrate robustness to trajectory noise in simulation and successful end-to-end deployment on a physical robot. The approach generalizes effectively across varied object configurations, significantly enhancing cross-scene manipulation performance from one-shot demonstrations.

Technology Category

Application Category

📝 Abstract

Robots often struggle to generalize from a single demonstration due to the lack of a transferable and interpretable spatial representation. In this work, we introduce TReF-6, a method that infers a simplified, abstracted 6DoF Task-Relevant Frame from a single trajectory. Our approach identifies an influence point purely from the trajectory geometry to define the origin for a local frame, which serves as a reference for parameterizing a Dynamic Movement Primitive (DMP). This influence point captures the task's spatial structure, extending the standard DMP formulation beyond start-goal imitation. The inferred frame is semantically grounded via a vision-language model and localized in novel scenes by Grounded-SAM, enabling functionally consistent skill generalization. We validate TReF-6 in simulation and demonstrate robustness to trajectory noise. We further deploy an end-to-end pipeline on real-world manipulation tasks, showing that TReF-6 supports one-shot imitation learning that preserves task intent across diverse object configurations.

Problem

Research questions and friction points this paper is trying to address.

Generalizing robot skills from single demonstrations

Inferring task-relevant spatial frames from trajectories

Enabling one-shot imitation learning across configurations

Innovation

Methods, ideas, or system contributions that make the work stand out.

Infers 6DoF task frame from single demonstration

Uses vision-language model for semantic grounding

Extends DMPs with geometry-derived influence point

🔎 Similar Papers

No similar papers found.

Authors to Follow