🤖 AI Summary
Existing self-supervised motion planning methods in complex environments suffer from suboptimal paths and poor generalization due to neglecting the physical constraints imposed by the Eikonal equation.
Method: This paper proposes a physics-informed self-supervised learning framework that uniquely integrates the localized Bellman optimality principle, temporal-difference learning, contrastive metric learning, and a neural Eikonal solver—explicitly enforcing both value-function optimality and geometric consistency with geodesic distance.
Contribution/Results: The method requires no expert demonstrations and supports robots with 2–12 degrees of freedom, enabling collision-free path planning across diverse unknown complex environments. Experiments demonstrate a 32% improvement in path optimality and a 47% increase in cross-environment generalization success rate over state-of-the-art self-supervised approaches, significantly overcoming generalization bottlenecks in high-dimensional and previously unseen scenarios.
📝 Abstract
The motion planning problem involves finding a collision-free path from a robot's starting to its target configuration. Recently, self-supervised learning methods have emerged to tackle motion planning problems without requiring expensive expert demonstrations. They solve the Eikonal equation for training neural networks and lead to efficient solutions. However, these methods struggle in complex environments because they fail to maintain key properties of the Eikonal equation, such as optimal value functions and geodesic distances. To overcome these limitations, we propose a novel self-supervised temporal difference metric learning approach that solves the Eikonal equation more accurately and enhances performance in solving complex and unseen planning tasks. Our method enforces Bellman's principle of optimality over finite regions, using temporal difference learning to avoid spurious local minima while incorporating metric learning to preserve the Eikonal equation's essential geodesic properties. We demonstrate that our approach significantly outperforms existing self-supervised learning methods in handling complex environments and generalizing to unseen environments, with robot configurations ranging from 2 to 12 degrees of freedom (DOF).