EquiContact: A Hierarchical SE(3) Vision-to-Force Equivariant Policy for Spatially Generalizable Contact-rich Tasks

📅 2025-07-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing vision-based robotic policies struggle to generalize robustly to unseen spatial poses for contact-rich manipulation tasks (e.g., peg insertion). Method: We propose the first end-to-end SE(3)-equivariant hierarchical robot policy framework: a high-level diffusion-equivariant description field (Diff-EDF) for spatially equivariant visual planning, and a low-level geometrically compliant ACT (G-CompACT) for local force control. The framework fuses wrist-mounted RGB, gated continuous-event vision (GCEV), and torque sensing within an SE(3)-equivariant observation space. Contribution/Results: This is the first approach ensuring full pipeline SE(3) equivariance—from perception and planning to compliant control—enabling strong generalization to arbitrary spatial configurations under few-shot settings. Evaluated on real-world peg-in-hole tasks, it achieves 98.7% success rate and maintains high robustness across unseen poses, empirically validating the efficacy of jointly leveraging SE(3) equivariance and geometric compliance.

Technology Category

Application Category

📝 Abstract
This paper presents a framework for learning vision-based robotic policies for contact-rich manipulation tasks that generalize spatially across task configurations. We focus on achieving robust spatial generalization of the policy for the peg-in-hole (PiH) task trained from a small number of demonstrations. We propose EquiContact, a hierarchical policy composed of a high-level vision planner (Diffusion Equivariant Descriptor Field, Diff-EDF) and a novel low-level compliant visuomotor policy (Geometric Compliant ACT, G-CompACT). G-CompACT operates using only localized observations (geometrically consistent error vectors (GCEV), force-torque readings, and wrist-mounted RGB images) and produces actions defined in the end-effector frame. Through these design choices, we show that the entire EquiContact pipeline is SE(3)-equivariant, from perception to force control. We also outline three key components for spatially generalizable contact-rich policies: compliance, localized policies, and induced equivariance. Real-world experiments on PiH tasks demonstrate a near-perfect success rate and robust generalization to unseen spatial configurations, validating the proposed framework and principles. The experimental videos can be found on the project website: https://sites.google.com/berkeley.edu/equicontact
Problem

Research questions and friction points this paper is trying to address.

Learning vision-based robotic policies for contact-rich tasks
Achieving robust spatial generalization from few demonstrations
Ensuring SE(3)-equivariance from perception to force control
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical SE(3)-equivariant vision-to-force policy
Localized compliant visuomotor control with GCEV
Diffusion Equivariant Descriptor Field for planning
🔎 Similar Papers
No similar papers found.
Joohwan Seo
Joohwan Seo
Mechanical Engineering, UC Berkeley
Nonlinear controlGeometric controlLearningRobotics
A
Arvind Kruthiventy
UC Berkeley, Department of Mechanical Engineering
S
Soomi Lee
UC Berkeley, Department of Mechanical Engineering
M
Megan Teng
UC Berkeley, Department of Mechanical Engineering
X
Xiang Zhang
UC Berkeley, Department of Mechanical Engineering
S
Seoyeon Choi
UC Berkeley, Department of Mechanical Engineering
Jongeun Choi
Jongeun Choi
Professor of Mechanical Engineering, Yonsei University
Machine LearningRobot LearningSystems and ControlAI in Healthcare
R
Roberto Horowitz
UC Berkeley, Department of Mechanical Engineering