BiFold: Bimanual Cloth Folding with Language Guidance

📅 2025-01-27
📈 Citations: 0
✨ Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the problem of natural language–guided dual-arm robotic cloth folding, tackling challenges including garment self-occlusion, inaccurate cloth dynamics modeling, and poor generalization across diverse fabric types and physical environments. We propose the first language-conditioned dual-arm cloth folding framework comprising: (1) an end-to-end language-to-action mapping leveraging a pre-trained vision-language model; (2) a simulation-based action parsing and text-alignment method to mitigate the scarcity of real-world human annotations; and (3) a unified policy integrating dual-arm coordinated motion planning with cloth dynamics–aware control. Our approach achieves state-of-the-art performance on a language-guided cloth folding benchmark, attains superior accuracy on our newly constructed multi-fabric dataset, and demonstrates strong zero-shot generalization to unseen garment categories, novel instructions, and previously unencountered physical environments.

Technology Category

Application Category

📝 Abstract
Cloth folding is a complex task due to the inevitable self-occlusions of clothes, their complicated dynamics, and the disparate materials, geometries, and textures that garments can have. In this work, we learn folding actions conditioned on text commands. Translating high-level, abstract instructions into precise robotic actions requires sophisticated language understanding and manipulation capabilities. To do that, we leverage a pre-trained vision-language model and repurpose it to predict manipulation actions. Our model, BiFold, can take context into account and achieves state-of-the-art performance on an existing language-conditioned folding benchmark. Given the lack of annotated bimanual folding data, we devise a procedure to automatically parse actions of a simulated dataset and tag them with aligned text instructions. BiFold attains the best performance on our dataset and can transfer to new instructions, garments, and environments.
Problem

Research questions and friction points this paper is trying to address.

Bilingual Instruction Understanding
Bimanual Manipulation
Cloth Folding
Innovation

Methods, ideas, or system contributions that make the work stand out.

Bilingual Multi-modal Learning
Simulated Data Augmentation
Adaptive Folding Techniques
🔎 Similar Papers
No similar papers found.
O
Oriol Barbany
Institut de Robòtica i Informàtica Industrial, CSIC-UPC
A
Adria ColomĂŠ
Institut de Robòtica i Informàtica Industrial, CSIC-UPC
Carme Torras
Carme Torras
Institut de Robòtica i Informàtica Industrial (CSIC-UPC)
Robotics and Artificial IntelligenceRobot LearningRobot VisionConstraint SatisfactionRobot Kinematics