JailWAM: Jailbreaking World Action Models in Robot Control

📅 2026-04-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the critical yet underexplored vulnerability of World Action Models (WAMs) to jailbreak attacks, which poses a severe security threat despite their strong physical interaction capabilities in robotic control. To tackle this gap, we propose JailWAM, the first framework dedicated to jailbreak attacks and safety evaluation for WAMs. Our approach unifies heterogeneous action spaces through vision-trajectory mapping, introduces a high-recall risk discriminator coupled with a two-stage verification mechanism, and establishes JailWAM-Bench, a benchmark for safety alignment. Evaluated in the RoboTwin simulation environment, JailWAM achieves an 84.2% attack success rate against LingBot-VA, effectively exposing latent safety flaws and providing a foundational pathway toward robust defensive mechanisms for embodied AI systems.
📝 Abstract
The World Action Model (WAM) can jointly predict future world states and actions, exhibiting stronger physical manipulation capabilities compared with traditional models. Such powerful physical interaction ability is a double-edged sword: if safety is ignored, it will directly threaten personal safety, property security and environmental safety. However, existing research pays extremely limited attention to the critical security gap: the vulnerability of WAM to jailbreak attacks. To fill this gap, we define the Three-Level Safety Classification Framework to systematically quantify the safety of robotic arm motions. Furthermore, we propose JailWAM, the first dedicated jailbreak attack and evaluation framework for WAM, which consists of three core components: (1) Visual-Trajectory Mapping, which unifies heterogeneous action spaces into visual trajectory representations and enables cross-architectural unified evaluation; (2) Risk Discriminator, which serves as a high-recall screening tool that optimizes the efficiency-accuracy trade-off when identifying destructive behaviors in visual trajectories; (3) Dual-Path Verification Strategy, which first conducts rapid coarse screening via a single-image-based video-action generation module, and then performs efficient and comprehensive verification through full closed-loop physical simulation. In addition, we construct JailWAM-Bench, a benchmark for comprehensively evaluating the safety alignment performance of WAM under jailbreak attacks. Experiments in RoboTwin simulation environment demonstrate that the proposed framework efficiently exposes physical vulnerabilities, achieving an 84.2% attack success rate on the state-of-the-art LingBot-VA. Meanwhile, robust defense mechanisms can be constructed based on JailWAM, providing an effective technical solution for designing safe and reliable robot control systems.
Problem

Research questions and friction points this paper is trying to address.

World Action Model
jailbreak attacks
robot safety
physical manipulation
security vulnerability
Innovation

Methods, ideas, or system contributions that make the work stand out.

World Action Model
Jailbreak Attack
Robot Safety
Visual-Trajectory Mapping
Safety Alignment
🔎 Similar Papers
No similar papers found.
H
Hanqing Liu
MoE key Lab of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University, Shanghai, China
S
Songping Wang
PR Lab, Nanjing University, Suzhou, China
J
Jiahuan Long
MoE key Lab of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University, Shanghai, China
J
Jiacheng Hou
Defense Innovation Institute, Chinese Academy of Military Science, Beijing, China
J
Jialiang Sun
Defense Innovation Institute, Chinese Academy of Military Science, Beijing, China
C
Chao Li
Defense Innovation Institute, Chinese Academy of Military Science, Beijing, China
Y
Yang Yang
Defense Innovation Institute, Chinese Academy of Military Science, Beijing, China
Wei Peng
Wei Peng
Institute of Information Engineering, Chinese Academy of Sciences
NLPNLGQADialogue
X
Xu Liu
Defense Innovation Institute, Chinese Academy of Military Science, Beijing, China
T
Tingsong Jiang
Defense Innovation Institute, Chinese Academy of Military Science, Beijing, China
W
Wen Yao
Defense Innovation Institute, Chinese Academy of Military Science, Beijing, China
Y
Yao Mu
MoE key Lab of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University, Shanghai, China