RoboSafe: Safeguarding Embodied Agents via Executable Safety Logic

📅 2025-12-24

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Embodied agents operating in dynamic, temporally sensitive, and context-rich environments are vulnerable to implicit hazardous instructions, leading to unsafe behaviors—a challenge inadequately addressed by static rule-based or prompt-level safety mechanisms. Method: We propose the first runtime safety framework grounded in executable predicate logic, integrating a hybrid long- and short-term safety memory with bidirectional reasoning (backward reflection and forward prediction) to enable dynamic contextual awareness, formal verifiability, and executable safety decisions. The framework unifies vision-language models, multimodal perception fusion, real-time trajectory backtracking, and risk-prediction inference. Contribution/Results: Evaluated across diverse embodied agents, our framework reduces hazardous behavior incidence by 36.8% while incurring negligible task performance degradation (<2%). It is further validated on a physical robotic arm, demonstrating real-world deployability and robustness. This work establishes a foundation for formally grounded, runtime-enforced safety in embodied AI systems.

Technology Category

Application Category

📝 Abstract

Embodied agents powered by vision-language models (VLMs) are increasingly capable of executing complex real-world tasks, yet they remain vulnerable to hazardous instructions that may trigger unsafe behaviors. Runtime safety guardrails, which intercept hazardous actions during task execution, offer a promising solution due to their flexibility. However, existing defenses often rely on static rule filters or prompt-level control, which struggle to address implicit risks arising in dynamic, temporally dependent, and context-rich environments. To address this, we propose RoboSafe, a hybrid reasoning runtime safeguard for embodied agents through executable predicate-based safety logic. RoboSafe integrates two complementary reasoning processes on a Hybrid Long-Short Safety Memory. We first propose a Backward Reflective Reasoning module that continuously revisits recent trajectories in short-term memory to infer temporal safety predicates and proactively triggers replanning when violations are detected. We then propose a Forward Predictive Reasoning module that anticipates upcoming risks by generating context-aware safety predicates from the long-term safety memory and the agent's multimodal observations. Together, these components form an adaptive, verifiable safety logic that is both interpretable and executable as code. Extensive experiments across multiple agents demonstrate that RoboSafe substantially reduces hazardous actions (-36.8% risk occurrence) compared with leading baselines, while maintaining near-original task performance. Real-world evaluations on physical robotic arms further confirm its practicality. Code will be released upon acceptance.

Problem

Research questions and friction points this paper is trying to address.

Safeguarding embodied agents from hazardous instructions triggering unsafe behaviors

Addressing implicit risks in dynamic, temporally dependent, and context-rich environments

Overcoming limitations of static rule filters and prompt-level control for safety

Innovation

Methods, ideas, or system contributions that make the work stand out.

Executable predicate-based safety logic for runtime safeguard

Hybrid reasoning with backward reflection and forward prediction

Adaptive verifiable safety logic interpretable as executable code

🔎 Similar Papers

No similar papers found.

Authors to Follow