Pixel2Catch: Multi-Agent Sim-to-Real Transfer for Agile Manipulation with a Single RGB Camera

📅 2026-02-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work proposes a vision-based reinforcement learning approach for high-speed dynamic grasping that bypasses explicit 3D pose estimation. By directly extracting pixel-level motion cues from monocular RGB images, the method formulates a heterogeneous multi-agent reinforcement learning framework, wherein the robotic arm and the multi-fingered hand are modeled as distinct agents with role-specific observations and reward functions to collaboratively learn policies for intercepting and catching thrown objects. Trained in simulation and efficiently transferred to real-world hardware, the approach demonstrates high-degree-of-freedom, highly agile dynamic grasping on a physical robot platform, validating its effectiveness and robustness in complex manipulation scenarios.

Technology Category

Application Category

📝 Abstract
To catch a thrown object, a robot must be able to perceive the object's motion and generate control actions in a timely manner. Rather than explicitly estimating the object's 3D position, this work focuses on a novel approach that recognizes object motion using pixel-level visual information extracted from a single RGB image. Such visual cues capture changes in the object's position and scale, allowing the policy to reason about the object's motion. Furthermore, to achieve stable learning in a high-DoF system composed of a robot arm equipped with a multi-fingered hand, we design a heterogeneous multi-agent reinforcement learning framework that defines the arm and hand as independent agents with distinct roles. Each agent is trained cooperatively using role-specific observations and rewards, and the learned policies are successfully transferred from simulation to the real world.
Problem

Research questions and friction points this paper is trying to address.

object catching
single RGB camera
real-time perception
agile manipulation
sim-to-real transfer
Innovation

Methods, ideas, or system contributions that make the work stand out.

pixel-level visual perception
multi-agent reinforcement learning
sim-to-real transfer
agile manipulation
heterogeneous agents
🔎 Similar Papers
No similar papers found.