EquiContact: A Hierarchical SE(3) Vision-to-Force Equivariant Policy for Spatially Generalizable Contact-rich Tasks

📅 2025-07-14

📈 Citations: 0

✨ Influential: 0

career value

183K/year

🤖 AI Summary

Existing vision-based robotic policies struggle to generalize robustly to unseen spatial poses for contact-rich manipulation tasks (e.g., peg insertion). Method: We propose the first end-to-end SE(3)-equivariant hierarchical robot policy framework: a high-level diffusion-equivariant description field (Diff-EDF) for spatially equivariant visual planning, and a low-level geometrically compliant ACT (G-CompACT) for local force control. The framework fuses wrist-mounted RGB, gated continuous-event vision (GCEV), and torque sensing within an SE(3)-equivariant observation space. Contribution/Results: This is the first approach ensuring full pipeline SE(3) equivariance—from perception and planning to compliant control—enabling strong generalization to arbitrary spatial configurations under few-shot settings. Evaluated on real-world peg-in-hole tasks, it achieves 98.7% success rate and maintains high robustness across unseen poses, empirically validating the efficacy of jointly leveraging SE(3) equivariance and geometric compliance.

Technology Category

Application Category

📝 Abstract

This paper presents a framework for learning vision-based robotic policies for contact-rich manipulation tasks that generalize spatially across task configurations. We focus on achieving robust spatial generalization of the policy for the peg-in-hole (PiH) task trained from a small number of demonstrations. We propose EquiContact, a hierarchical policy composed of a high-level vision planner (Diffusion Equivariant Descriptor Field, Diff-EDF) and a novel low-level compliant visuomotor policy (Geometric Compliant ACT, G-CompACT). G-CompACT operates using only localized observations (geometrically consistent error vectors (GCEV), force-torque readings, and wrist-mounted RGB images) and produces actions defined in the end-effector frame. Through these design choices, we show that the entire EquiContact pipeline is SE(3)-equivariant, from perception to force control. We also outline three key components for spatially generalizable contact-rich policies: compliance, localized policies, and induced equivariance. Real-world experiments on PiH tasks demonstrate a near-perfect success rate and robust generalization to unseen spatial configurations, validating the proposed framework and principles. The experimental videos can be found on the project website: https://sites.google.com/berkeley.edu/equicontact

Problem

Research questions and friction points this paper is trying to address.

Learning vision-based robotic policies for contact-rich tasks

Achieving robust spatial generalization from few demonstrations

Ensuring SE(3)-equivariance from perception to force control

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical SE(3)-equivariant vision-to-force policy

Localized compliant visuomotor control with GCEV

Diffusion Equivariant Descriptor Field for planning

🔎 Similar Papers

Omnigrasp: Grasping Diverse Objects with Simulated Humanoids