π€ AI Summary
Embodied agents operating in dynamic, temporally sensitive, and context-rich environments are vulnerable to implicit hazardous instructions, leading to unsafe behaviorsβa challenge inadequately addressed by static rule-based or prompt-level safety mechanisms.
Method: We propose the first runtime safety framework grounded in executable predicate logic, integrating a hybrid long- and short-term safety memory with bidirectional reasoning (backward reflection and forward prediction) to enable dynamic contextual awareness, formal verifiability, and executable safety decisions. The framework unifies vision-language models, multimodal perception fusion, real-time trajectory backtracking, and risk-prediction inference.
Contribution/Results: Evaluated across diverse embodied agents, our framework reduces hazardous behavior incidence by 36.8% while incurring negligible task performance degradation (<2%). It is further validated on a physical robotic arm, demonstrating real-world deployability and robustness. This work establishes a foundation for formally grounded, runtime-enforced safety in embodied AI systems.
π Abstract
Embodied agents powered by vision-language models (VLMs) are increasingly capable of executing complex real-world tasks, yet they remain vulnerable to hazardous instructions that may trigger unsafe behaviors. Runtime safety guardrails, which intercept hazardous actions during task execution, offer a promising solution due to their flexibility. However, existing defenses often rely on static rule filters or prompt-level control, which struggle to address implicit risks arising in dynamic, temporally dependent, and context-rich environments. To address this, we propose RoboSafe, a hybrid reasoning runtime safeguard for embodied agents through executable predicate-based safety logic. RoboSafe integrates two complementary reasoning processes on a Hybrid Long-Short Safety Memory. We first propose a Backward Reflective Reasoning module that continuously revisits recent trajectories in short-term memory to infer temporal safety predicates and proactively triggers replanning when violations are detected. We then propose a Forward Predictive Reasoning module that anticipates upcoming risks by generating context-aware safety predicates from the long-term safety memory and the agent's multimodal observations. Together, these components form an adaptive, verifiable safety logic that is both interpretable and executable as code. Extensive experiments across multiple agents demonstrate that RoboSafe substantially reduces hazardous actions (-36.8% risk occurrence) compared with leading baselines, while maintaining near-original task performance. Real-world evaluations on physical robotic arms further confirm its practicality. Code will be released upon acceptance.