🤖 AI Summary
To address policy failure in out-of-distribution (OOD) state spaces during visual motor imitation learning, this paper proposes a plug-and-play inverse recovery strategy enhancement method requiring no additional data collection. The core innovation lies in the first introduction of object-centered keypoint manifold gradient estimation to drive inverse dynamics modeling, thereby dynamically guiding policy regression toward the training distribution. Unlike behavior cloning and other baselines, our method is model-agnostic—requiring no retraining—and supports autonomous demonstration acquisition for continual learning. Evaluated on both simulation and real-robot platforms, the approach improves OOD task success rate by 77.7%, significantly enhancing policy generalization and deployment robustness.
📝 Abstract
We propose an object-centric recovery (OCR) framework to address the challenges of out-of-distribution (OOD) scenarios in visuomotor policy learning. Previous behavior cloning (BC) methods rely heavily on a large amount of labeled data coverage, failing in unfamiliar spatial states. Without relying on extra data collection, our approach learns a recovery policy constructed by an inverse policy inferred from the object keypoint manifold gradient in the original training data. The recovery policy serves as a simple add-on to any base visuomotor BC policy, agnostic to a specific method, guiding the system back towards the training distribution to ensure task success even in OOD situations. We demonstrate the effectiveness of our object-centric framework in both simulation and real robot experiments, achieving an improvement of 77.7% over the base policy in OOD. Furthermore, we show OCR's capacity to autonomously collect demonstrations for continual learning. Overall, we believe this framework represents a step toward improving the robustness of visuomotor policies in real-world settings. Project Website: https://sites.google.com/view/ocr-penn