Don't Fool Me Twice: Adapting to Adversity in the Wild with Experience-Driven Reasoning

📅 2026-05-29
📈 Citations: 0
Influential: 0
📄 PDF

career value

193K/year
🤖 AI Summary
This work addresses the challenge mobile robots face in anticipating morphology-related hazards within unknown unstructured environments. To this end, the authors propose a continual learning framework enabling embodied agents to learn online from perturbations. The approach uniquely integrates semantic voxel modeling with kernel regression to achieve few-shot modeling of transient anomalies, and couples vision-language models with cognitive uncertainty estimation to drive semantic attribution and spatial behavioral learning. Extensive simulations and real-world experiments across diverse robotic platforms and adverse scenarios validate four core hypotheses, demonstrating significant improvements in the adaptability and robustness of robots operating in wild, unstructured settings.
📝 Abstract
In robotics, dangers and adversity modes are often embodiment-specific and relative to each agent. A frontier of autonomous mobile robotics is to enable agents to operate effectively in the wild in unseen unstructured environments. A significant challenge in unseen unstructured environments is that it may not be possible to predict all the dangers to the specific robot. Although recent work has used large foundation vision-language models (VLMs) to preemptively predict an exhaustive list of common-sense dangers, it remains difficult to capture possible interaction and embodiment-dependent adversities. We propose a continual learning framework for a mobile embodied agent to learn online from disturbances and attribute anomalous behaviours to causes through semantics, enabling better prediction and planning of the world in the future. Our framework, "Don't Fool Me Twice", first observes disturbances and describes their effects on the robot; this description is augmented with visual context to query a VLM to predict possible causes; the local disturbance is characterized using kernel regression, which allows for efficient, few-shot modeling of transient anomalies. We leverage semantic voxel-centric modeling to estimate epistemic uncertainty, enabling richer downstream recovery by treating interaction-driven disturbances as learnable spatial behaviors. We present four hypotheses and validate them in simulation and on hardware across embodiments and adversity modes.
Problem

Research questions and friction points this paper is trying to address.

adversity
embodiment
unstructured environments
disturbance
mobile robotics
Innovation

Methods, ideas, or system contributions that make the work stand out.

continual learning
embodied reasoning
semantic voxel modeling
adversity adaptation
vision-language models