π€ AI Summary
This work addresses the challenge that embodied agents often fail to reliably detect physical hazards due to insufficient explicit risk recognition and action-conditioned reasoning, leading to missed or false alarms. To mitigate this, we propose EMBGuardβthe first safety guardrail architecture based on a multimodal large language model (MLLM) specifically designed for embodied agents. EMBGuard decouples risk reasoning from policy decision-making, jointly evaluating hazardous states from visual observations and candidate actions, and generates natural language explanations of identified risks. We also introduce the EMBHazard training dataset and the EMBGuardTest benchmark spanning diverse scenarios. Experiments demonstrate that lightweight EMBGuard variants with only 2B or 4B parameters match the performance of proprietary models such as GPT-5.1 and Gemini-2.5-Pro while significantly reducing false alarm rates and enhancing feasibility for real-time deployment.
π Abstract
MLLM-powered embodied agents deployed in real-world environments encounter physical hazards. However, existing approaches lack explicit mechanisms for identifying hazards and reasoning about action-conditioned risks, leading agents to either miss risky interactions or over-identify risks. To address this, we propose EMBGuard, the first MLLM-based safety guardrail for embodied agents designed to decouple physical risk reasoning from agent policy. By evaluating a (visual observation, action) pair, EMBGuard identifies hazardous configurations and provides natural language explanations of potential risks. Alongside EMBGuard, we contribute EMBHazard, a training dataset of 15.1K action-conditioned pairs, and EMBGuardTest, a benchmark of 329 manually curated real-world scenarios spanning seven physical risk categories. Through compositional variation of hazards and actions, we generate diverse risky and benign scenarios that agents may encounter during planning. Despite its compact size (2B, 4B), EMBGuard achieves performance competitive with proprietary MLLMs (e.g., GPT-5.1, Gemini-2.5-Pro) while significantly reducing the false-positive rates that hinder real-time deployment. We make the code, data, and models publicly available at https://github.com/dongwxxkchoi/EMBGuard