🤖 AI Summary
Addressing the constraint of non-modifiable hardware in commercial vacuum cleaning robots, this paper proposes a plug-and-play visual monitoring system enabling both environmental object geolocalization and robot self-localization. Methodologically, a smartphone mounted on the robot simultaneously captures images and IMU data; neural inertial navigation (Neural IN) estimates the robot’s pose, while rotation-aware ensemble (RAE) test-time augmentation mitigates domain shift. Leveraging spatial regularities inherent in cleaning trajectories, we design a loop-closure detection and optimization strategy. Experiments in retail environments demonstrate a relative pose estimation error of only 0.83 m and an average object localization accuracy of 0.97 m across over one hundred items. The primary contribution is a lightweight, vision–inertial cooperative localization framework that requires no hardware modification, achieving a favorable trade-off between deployment simplicity and localization robustness.
📝 Abstract
This paper presents Piggyback Camera, an easy-to-deploy system for visual surveillance using commercial robot vacuums. Rather than requiring access to internal robot systems, our approach mounts a smartphone equipped with a camera and Inertial Measurement Unit (IMU) on the robot, making it applicable to any commercial robot without hardware modifications. The system estimates robot poses through neural inertial navigation and efficiently captures images at regular spatial intervals throughout the cleaning task. We develop a novel test-time data augmentation method called Rotation-Augmented Ensemble (RAE) to mitigate domain gaps in neural inertial navigation. A loop closure method that exploits robot cleaning patterns further refines these estimated poses. We demonstrate the system with an object mapping application that analyzes captured images to geo-localize objects in the environment. Experimental evaluation in retail environments shows that our approach achieves 0.83 m relative pose error for robot localization and 0.97 m positional error for object mapping of over 100 items.