Instant-Fold: In-Context Imitation Learning for Deformable Object Manipulation

📅 2026-06-02
📈 Citations: 0
Influential: 0
📄 PDF

career value

188K/year
🤖 AI Summary
Manipulating deformable objects presents significant challenges for policy learning due to high-dimensional states, partial observability, long-horizon interactions, topological changes, and the existence of multiple valid manipulation modes. This work proposes a contextual imitation learning framework that, from a single human demonstration, infers and executes diverse manipulation strategies without requiring gradient updates or fine-tuning. The approach integrates temporally contrastive pretraining to obtain deformation-aware visual representations and introduces a flow-matching-based Transformer policy that conditions on the demonstration to predict action sequences. Trained entirely in simulation, the model achieves zero-shot transfer to real robots and demonstrates, for the first time, flexible generalization across both spatial configurations and action sequencing in a variety of folding tasks.
📝 Abstract
Deformable object manipulation (DOM) is challenging due to high-dimensional, partially observable states that evolve through long-horizon, topology-changing interactions with multiple valid manipulation modes. We introduce Instant-Fold, an in-context imitation learning framework for DOM. Given a single human demonstration, our policy infers and executes diverse manipulation modes directly from the demonstration, including variations in spatial execution and ordering, without requiring gradient updates. Our approach first learns deformation-aware visual representations via temporal contrastive pretraining, after which a flow-matching transformer policy conditioned on the demonstration predicts actions to execute the intended manipulation mode. Trained entirely in simulation, Instant-Fold generalizes across diverse folding modes and transfers zero-shot to real-world settings without additional data collection or finetuning. Videos are available at https://instant-fold.github.io.
Problem

Research questions and friction points this paper is trying to address.

deformable object manipulation
high-dimensional states
partially observable
topology-changing interactions
multiple manipulation modes
Innovation

Methods, ideas, or system contributions that make the work stand out.

in-context imitation learning
deformable object manipulation
flow-matching transformer
temporal contrastive pretraining
zero-shot sim-to-real transfer