๐ค AI Summary
This work addresses the challenge of enabling robots to learn and generalize multi-step fabric manipulation skills from language instructions. Methodologically, it introduces a large language model (LLM)-driven hierarchical skill learning framework featuring: (i) a novel commonsense-guided automatic skill discovery mechanism that decomposes long-horizon demonstrations into semantically interpretable and dynamics-consistent primitive skill units; (ii) LLM-based semantic planning that maps natural language instructions to executable skill sequences; and (iii) closed-loop execution integrating end-to-end imitation learning with fabric physics modeling. Experiments demonstrate substantial improvements over existing baselines on both seen and unseen fabric manipulation tasks. Notably, this is the first approach to achieve cross-task fabric skill transfer under language conditioning, empirically validating both skill reusability and effective semanticโaction alignment.
๐ Abstract
Multi-step cloth manipulation is a challenging problem for robots due to the high-dimensional state spaces and the dynamics of cloth. Despite recent significant advances in end-to-end imitation learning for multi-step cloth manipulation skills, these methods fail to generalize to unseen tasks. Our insight in tackling the challenge of generalizable multi-step cloth manipulation is decomposition. We propose a novel pipeline that autonomously learns basic skills from long demonstrations and composes learned basic skills to generalize to unseen tasks. Specifically, our method first discovers and learns basic skills from the existing long demonstration benchmark with the commonsense knowledge of a large language model (LLM). Then, leveraging a high-level LLM-based task planner, these basic skills can be composed to complete unseen tasks. Experimental results demonstrate that our method outperforms baseline methods in learning multi-step cloth manipulation skills for both seen and unseen tasks.