The Lie We Tell: Correcting the Euclidean Fallacy in Vision Language Action Policies via Score Matching on Tangent Space

📅 2026-06-01
📈 Citations: 0
Influential: 0
📄 PDF

career value

167K/year
🤖 AI Summary
Existing diffusion-based visuomotor policies erroneously model SE(3) poses as Euclidean vectors, leading to manifold drift, broken equivariance, and non-geodesic trajectories. To address these issues, this work proposes the Lie Diffuser Actor (LDA), which establishes an intrinsic diffusion process directly on the SE(3) manifold for the first time. LDA injects noise via a left-invariant stochastic differential equation, predicts scores in the tangent space, and maps back to the manifold using the exponential map. This approach rigorously preserves the manifold structure, ensuring coordinate equivariance and geodesic optimality of generated trajectories. Experiments demonstrate that LDA improves average task length by 7.3% (from 3.27 to 3.51) on the CALVIN ABC→D benchmark and significantly outperforms baseline methods in real-world robotic tasks.
📝 Abstract
Diffusion-based Vision-Language-Action policies achieve remarkable success in robotic manipulation, yet commit a fundamental geometric error we term the $\textbf{Euclidean Fallacy}$: representing SE(3) poses as flat $\mathbb{R}^{12}$ vectors. This approximation induces (1) manifold drift violating SO(3) constraints, (2) broken equivariance under coordinate transformations, and (3) non-geodesic trajectories with excessive kinematic cost. We introduce $\textbf{Lie Diffuser Actor (LDA)}$, a diffusion framework operating intrinsically on SE(3). Our method injects noise through left-invariant SDEs, predicts scores in the tangent space, and retracts samples via the exponential map. This formulation eliminates manifold drift by construction while guaranteeing coordinate-frame equivariance and geodesic optimality. On CALVIN ABC$\rightarrow$D, LDA improves average task length from $3.27$ to $3.51$ ($+7.3\%$). We further validate our method on real robot and the results show that our methodology outperforms the baseline on majority tasks.
Problem

Research questions and friction points this paper is trying to address.

Euclidean Fallacy
SE(3)
manifold drift
equivariance
geodesic trajectories
Innovation

Methods, ideas, or system contributions that make the work stand out.

SE(3) diffusion
manifold-aware policy
tangent space score matching
Lie group robotics
equivariant action generation
B
Bing-Cheng Chuang
Department of Computer Science and Information Engineering & Artificial Intelligence Center of Research Excellence, National Taiwan University, Taipei, Taiwan
I
I-Hsuan Chu
Department of Computer Science and Information Engineering & Artificial Intelligence Center of Research Excellence, National Taiwan University, Taipei, Taiwan
B
Bor-Jiun Lin
Department of Computer Science and Information Engineering & Artificial Intelligence Center of Research Excellence, National Taiwan University, Taipei, Taiwan
Y
YuanFu Yang
Institute of Artificial Intelligence Innovation, National Yang Ming Chiao Tung University, Hsinchu, Taiwan
Min Sun
Min Sun
Associate Professor at National Tsing Hua University; Principal Applied Scientist at Amazon
computer visionmachine learningdeep learningand AI
Chun-Yi Lee
Chun-Yi Lee
Department of Computer Science and Information Engineering, National Taiwan University
Intelligent RoboticsDeep Reinforcement LearningComputer VisionVirtual-to-Real Transfer