Beyond the Patch: Exploring Vulnerabilities of Visuomotor Policies via Viewpoint-Consistent 3D Adversarial Object

📅 2026-03-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the vulnerability of vision-based robotic manipulation policies to adversarial attacks under dynamic viewpoints, where conventional 2D adversarial patches fail due to perspective distortion—particularly from wrist-mounted cameras on robotic arms. To overcome this limitation, the authors propose a viewpoint-consistent 3D adversarial texture optimization method that jointly optimizes surface textures via differentiable rendering. The approach integrates a coarse-to-fine frequency strategy, saliency-guided perturbations, and a target-oriented loss function within the Expectation over Transformation (EOT) framework, enabling robust cross-view and cross-distance attacks. The method demonstrates strong effectiveness across diverse environmental conditions, exhibits black-box transferability, and successfully compromises real-world robotic systems, thereby revealing profound vulnerabilities in visual motor policies.

Technology Category

Application Category

📝 Abstract
Neural network-based visuomotor policies enable robots to perform manipulation tasks but remain susceptible to perceptual attacks. For example, conventional 2D adversarial patches are effective under fixed-camera setups, where appearance is relatively consistent; however, their efficacy often diminishes under dynamic viewpoints from moving cameras, such as wrist-mounted setups, due to perspective distortions. To proactively investigate potential vulnerabilities beyond 2D patches, this work proposes a viewpoint-consistent adversarial texture optimization method for 3D objects through differentiable rendering. As optimization strategies, we employ Expectation over Transformation (EOT) with a Coarse-to-Fine (C2F) curriculum, exploiting distance-dependent frequency characteristics to induce textures effective across varying camera-object distances. We further integrate saliency-guided perturbations to redirect policy attention and design a targeted loss that persistently drives robots toward adversarial objects. Our comprehensive experiments show that the proposed method is effective under various environmental conditions, while confirming its black-box transferability and real-world applicability.
Problem

Research questions and friction points this paper is trying to address.

visuomotor policies
adversarial attacks
3D objects
dynamic viewpoints
perceptual vulnerabilities
Innovation

Methods, ideas, or system contributions that make the work stand out.

viewpoint-consistent
3D adversarial texture
differentiable rendering
visuomotor policy
Expectation over Transformation
🔎 Similar Papers
No similar papers found.
C
Chanmi Lee
School of Computing, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, 34141, Republic of Korea
M
Minsung Yoon
School of Computing, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, 34141, Republic of Korea
W
Woojae Kim
School of Computing, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, 34141, Republic of Korea
Sebin Lee
Sebin Lee
Master's Student @ 33Lab, Soongsil University
Human-computer InteractionComputer GraphicsVirtual RealityVirtual Entertainment
S
Sung-eui Yoon
School of Computing, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, 34141, Republic of Korea