Learning Generalizable Language-Conditioned Cloth Manipulation from Long Demonstrations

๐Ÿ“… 2025-03-06
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work addresses the challenge of enabling robots to learn and generalize multi-step fabric manipulation skills from language instructions. Methodologically, it introduces a large language model (LLM)-driven hierarchical skill learning framework featuring: (i) a novel commonsense-guided automatic skill discovery mechanism that decomposes long-horizon demonstrations into semantically interpretable and dynamics-consistent primitive skill units; (ii) LLM-based semantic planning that maps natural language instructions to executable skill sequences; and (iii) closed-loop execution integrating end-to-end imitation learning with fabric physics modeling. Experiments demonstrate substantial improvements over existing baselines on both seen and unseen fabric manipulation tasks. Notably, this is the first approach to achieve cross-task fabric skill transfer under language conditioning, empirically validating both skill reusability and effective semanticโ€“action alignment.

Technology Category

Application Category

๐Ÿ“ Abstract
Multi-step cloth manipulation is a challenging problem for robots due to the high-dimensional state spaces and the dynamics of cloth. Despite recent significant advances in end-to-end imitation learning for multi-step cloth manipulation skills, these methods fail to generalize to unseen tasks. Our insight in tackling the challenge of generalizable multi-step cloth manipulation is decomposition. We propose a novel pipeline that autonomously learns basic skills from long demonstrations and composes learned basic skills to generalize to unseen tasks. Specifically, our method first discovers and learns basic skills from the existing long demonstration benchmark with the commonsense knowledge of a large language model (LLM). Then, leveraging a high-level LLM-based task planner, these basic skills can be composed to complete unseen tasks. Experimental results demonstrate that our method outperforms baseline methods in learning multi-step cloth manipulation skills for both seen and unseen tasks.
Problem

Research questions and friction points this paper is trying to address.

Generalizing multi-step cloth manipulation to unseen tasks
Decomposing long demonstrations to learn basic skills
Composing skills using LLM-based task planning for generalization
Innovation

Methods, ideas, or system contributions that make the work stand out.

Decomposes tasks into basic skills
Uses LLM for skill discovery and planning
Composes skills for unseen tasks generalization
๐Ÿ”Ž Similar Papers
No similar papers found.
H
Hanyi Zhao
Center for Artificial Intelligence and Robotics, Tsinghua Shenzhen International Graduate School, Shenzhen, China
J
Jinxuan Zhu
Center for Artificial Intelligence and Robotics, Tsinghua Shenzhen International Graduate School, Shenzhen, China
Zihao Yan
Zihao Yan
Center for Artificial Intelligence and Robotics, Tsinghua Shenzhen International Graduate School, Shenzhen, China
Y
Yichen Li
Center for Artificial Intelligence and Robotics, Tsinghua Shenzhen International Graduate School, Shenzhen, China
Yuhong Deng
Yuhong Deng
PhD student in Computer Science, National University of Singapore
RoboticsRobotic ManipulationRobot Learning
Xueqian Wang
Xueqian Wang
Tsinghua University
Information FusionTarget DetectionRadar ImagingImage Processing