Move-Then-Operate: Behavioral Phasing for Human-Like Robotic Manipulation

📅 2026-04-26

📈 Citations: 0

✨ Influential: 0

career value

170K/year

🤖 AI Summary

This work addresses the challenge in robotic manipulation where coarse positioning and fine contact interactions are tightly coupled, hindering effective policy learning. To overcome this, the authors propose explicitly decoupling manipulation tasks into two distinct phases: “movement” and “manipulation.” By incorporating structured inductive biases, they design a dual-expert policy architecture paired with a learnable phase selector. Multimodal large language models (MLLMs) are leveraged to automatically generate human-aligned phase labels during training. Evaluated on the RoboTwin2 benchmark, the method achieves a 68.9% success rate—outperforming monolithic policy baselines by 24%—while reducing training steps by 40%. Remarkably, it matches the performance of models trained with ten times more data, demonstrating substantially improved generalization and sample efficiency.

Technology Category

Application Category

📝 Abstract

We present Move-Then-Operate, a Vision language action framework that explicitly decouples robotic manipulation into two distinct behavioral phases: coarse relocation (move) and contact-critical interaction (operate). Unlike monolithic policies that conflate these heterogeneous regimes, our architecture employs a dual-expert policy routed by a learnable phase selector, introducing a structural inductive bias that isolates phase-specific dynamics. Phase labels are automatically generated via an MLLM-based pipeline conditioned on lightweight contextual cues such as end-effector velocity and subtask decomposition to ensure alignment with human motor patterns. Evaluated on the RoboTwin2 benchmark, our method achieves an average success rate of $68.9\%$, outperforming the monolithic $π_0$ baseline by $24\%$. It matches or exceeds models trained on $10\times$ more data and reaches peak performance in $40\%$ fewer training steps, demonstrating that architectural disentanglement of move and operate phases is a highly effective and efficient strategy for mastering high-precision manipulation.

Problem

Research questions and friction points this paper is trying to address.

robotic manipulation

behavioral phasing

move-and-operate

contact-critical interaction

coarse relocation

Innovation

Methods, ideas, or system contributions that make the work stand out.

behavioral phasing

dual-expert policy

vision-language action framework

phase disentanglement

robotic manipulation

🔎 Similar Papers

What Foundation Models can Bring for Robot Learning in Manipulation : A Survey

2024-04-28arXiv.orgCitations: 15