Ag2x2: Robust Agent-Agnostic Visual Representations for Zero-Shot Bimanual Manipulation

📅 2025-07-26

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Existing methods for zero-shot bimanual manipulation with dual-arm robots neglect end-effector states and struggle to model inter-hand coordination. To address this, we propose the first agent-agnostic visual representation framework that explicitly models bimanual synergy. Our approach jointly encodes object dynamics and bimanual motion patterns via contrastive learning and cross-modal encoding—requiring neither human demonstrations nor hand-crafted reward functions. Crucially, it decouples agent-specific information (e.g., pose) from task-invariant features, enabling unified, coordination-aware visual representation. Evaluated on 13 bimanual manipulation tasks, our method achieves a 73.5% zero-shot success rate—substantially outperforming reward-engineered baselines—and maintains robustness in complex scenarios involving deformable objects (e.g., ropes).

Technology Category

Application Category

📝 Abstract

Bimanual manipulation, fundamental to human daily activities, remains a challenging task due to its inherent complexity of coordinated control. Recent advances have enabled zero-shot learning of single-arm manipulation skills through agent-agnostic visual representations derived from human videos; however, these methods overlook crucial agent-specific information necessary for bimanual coordination, such as end-effector positions. We propose Ag2x2, a computational framework for bimanual manipulation through coordination-aware visual representations that jointly encode object states and hand motion patterns while maintaining agent-agnosticism. Extensive experiments demonstrate that Ag2x2 achieves a 73.5% success rate across 13 diverse bimanual tasks from Bi-DexHands and PerAct2, including challenging scenarios with deformable objects like ropes. This performance outperforms baseline methods and even surpasses the success rate of policies trained with expert-engineered rewards. Furthermore, we show that representations learned through Ag2x2 can be effectively leveraged for imitation learning, establishing a scalable pipeline for skill acquisition without expert supervision. By maintaining robust performance across diverse tasks without human demonstrations or engineered rewards, Ag2x2 represents a step toward scalable learning of complex bimanual robotic skills.

Problem

Research questions and friction points this paper is trying to address.

Overcoming bimanual coordination complexity in robotic manipulation

Enhancing zero-shot learning with agent-agnostic visual representations

Improving success rates in diverse bimanual tasks without expert supervision

Innovation

Methods, ideas, or system contributions that make the work stand out.

Coordination-aware visual representations for bimanual tasks

Encodes object states and hand motion patterns

Agent-agnostic yet robust across diverse tasks

🔎 Similar Papers

No similar papers found.

Authors to Follow