SEBA: Sample-Efficient Black-Box Attacks on Visual Reinforcement Learning

📅 2025-11-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Visual Reinforcement Learning (VRL) exhibits adversarial vulnerability under image inputs, yet existing black-box attacks suffer from prohibitive environment query costs in continuous-control tasks due to high-dimensional action spaces. This paper proposes a sample-efficient black-box attack framework that jointly addresses these challenges. First, it constructs a shadow Q-network to estimate cumulative returns under adversarial states, and leverages a learned world model to simulate environment dynamics—significantly reducing real-world interactions. Second, it employs a generative adversarial network to produce visually imperceptible input perturbations. Crucially, we introduce a novel two-stage iterative training scheme that co-optimizes perturbation generation and policy evaluation. Evaluated on MuJoCo and Atari benchmarks, our method substantially degrades the target agent’s cumulative reward while reducing environment queries by over 90%, outperforming state-of-the-art black-box attacks and even several white-box alternatives.

Technology Category

Application Category

📝 Abstract
Visual reinforcement learning has achieved remarkable progress in visual control and robotics, but its vulnerability to adversarial perturbations remains underexplored. Most existing black-box attacks focus on vector-based or discrete-action RL, and their effectiveness on image-based continuous control is limited by the large action space and excessive environment queries. We propose SEBA, a sample-efficient framework for black-box adversarial attacks on visual RL agents. SEBA integrates a shadow Q model that estimates cumulative rewards under adversarial conditions, a generative adversarial network that produces visually imperceptible perturbations, and a world model that simulates environment dynamics to reduce real-world queries. Through a two-stage iterative training procedure that alternates between learning the shadow model and refining the generator, SEBA achieves strong attack performance while maintaining efficiency. Experiments on MuJoCo and Atari benchmarks show that SEBA significantly reduces cumulative rewards, preserves visual fidelity, and greatly decreases environment interactions compared to prior black-box and white-box methods.
Problem

Research questions and friction points this paper is trying to address.

Addresses vulnerability of visual reinforcement learning to adversarial attacks
Overcomes limitations of black-box attacks in continuous control settings
Reduces environment queries while maintaining effective attack performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Shadow Q model estimates rewards under adversarial conditions
GAN generates imperceptible perturbations for visual attacks
World model reduces environment queries through simulation
🔎 Similar Papers
No similar papers found.
Tairan Huang
Tairan Huang
Beihang University
cv
Y
Yulin Jin
The Hong Kong Polytechnic University
J
Junxu Liu
The Hong Kong Polytechnic University
Qingqing Ye
Qingqing Ye
Assistant Professor, The Hong Kong Polytechnic University
data privacy and securityadversarial machine learning
H
Haibo Hu
The Hong Kong Polytechnic University