JailWAM: Jailbreaking World Action Models in Robot Control

📅 2026-04-07

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the critical yet underexplored vulnerability of World Action Models (WAMs) to jailbreak attacks, which poses a severe security threat despite their strong physical interaction capabilities in robotic control. To tackle this gap, we propose JailWAM, the first framework dedicated to jailbreak attacks and safety evaluation for WAMs. Our approach unifies heterogeneous action spaces through vision-trajectory mapping, introduces a high-recall risk discriminator coupled with a two-stage verification mechanism, and establishes JailWAM-Bench, a benchmark for safety alignment. Evaluated in the RoboTwin simulation environment, JailWAM achieves an 84.2% attack success rate against LingBot-VA, effectively exposing latent safety flaws and providing a foundational pathway toward robust defensive mechanisms for embodied AI systems.

📝 Abstract

The World Action Model (WAM) can jointly predict future world states and actions, exhibiting stronger physical manipulation capabilities compared with traditional models. Such powerful physical interaction ability is a double-edged sword: if safety is ignored, it will directly threaten personal safety, property security and environmental safety. However, existing research pays extremely limited attention to the critical security gap: the vulnerability of WAM to jailbreak attacks. To fill this gap, we define the Three-Level Safety Classification Framework to systematically quantify the safety of robotic arm motions. Furthermore, we propose JailWAM, the first dedicated jailbreak attack and evaluation framework for WAM, which consists of three core components: (1) Visual-Trajectory Mapping, which unifies heterogeneous action spaces into visual trajectory representations and enables cross-architectural unified evaluation; (2) Risk Discriminator, which serves as a high-recall screening tool that optimizes the efficiency-accuracy trade-off when identifying destructive behaviors in visual trajectories; (3) Dual-Path Verification Strategy, which first conducts rapid coarse screening via a single-image-based video-action generation module, and then performs efficient and comprehensive verification through full closed-loop physical simulation. In addition, we construct JailWAM-Bench, a benchmark for comprehensively evaluating the safety alignment performance of WAM under jailbreak attacks. Experiments in RoboTwin simulation environment demonstrate that the proposed framework efficiently exposes physical vulnerabilities, achieving an 84.2% attack success rate on the state-of-the-art LingBot-VA. Meanwhile, robust defense mechanisms can be constructed based on JailWAM, providing an effective technical solution for designing safe and reliable robot control systems.

Problem

Research questions and friction points this paper is trying to address.

World Action Model

jailbreak attacks

robot safety

physical manipulation

security vulnerability

Innovation

Methods, ideas, or system contributions that make the work stand out.

World Action Model

Jailbreak Attack

Robot Safety

Visual-Trajectory Mapping

Safety Alignment

🔎 Similar Papers

No similar papers found.

Authors to Follow