🤖 AI Summary
This work addresses the challenge mobile robots face in anticipating morphology-related hazards within unknown unstructured environments. To this end, the authors propose a continual learning framework enabling embodied agents to learn online from perturbations. The approach uniquely integrates semantic voxel modeling with kernel regression to achieve few-shot modeling of transient anomalies, and couples vision-language models with cognitive uncertainty estimation to drive semantic attribution and spatial behavioral learning. Extensive simulations and real-world experiments across diverse robotic platforms and adverse scenarios validate four core hypotheses, demonstrating significant improvements in the adaptability and robustness of robots operating in wild, unstructured settings.
📝 Abstract
In robotics, dangers and adversity modes are often embodiment-specific and relative to each agent. A frontier of autonomous mobile robotics is to enable agents to operate effectively in the wild in unseen unstructured environments. A significant challenge in unseen unstructured environments is that it may not be possible to predict all the dangers to the specific robot. Although recent work has used large foundation vision-language models (VLMs) to preemptively predict an exhaustive list of common-sense dangers, it remains difficult to capture possible interaction and embodiment-dependent adversities. We propose a continual learning framework for a mobile embodied agent to learn online from disturbances and attribute anomalous behaviours to causes through semantics, enabling better prediction and planning of the world in the future. Our framework, "Don't Fool Me Twice", first observes disturbances and describes their effects on the robot; this description is augmented with visual context to query a VLM to predict possible causes; the local disturbance is characterized using kernel regression, which allows for efficient, few-shot modeling of transient anomalies. We leverage semantic voxel-centric modeling to estimate epistemic uncertainty, enabling richer downstream recovery by treating interaction-driven disturbances as learnable spatial behaviors. We present four hypotheses and validate them in simulation and on hardware across embodiments and adversity modes.