SEBA: Sample-Efficient Black-Box Attacks on Visual Reinforcement Learning

📅 2025-11-12

📈 Citations: 0

✨ Influential: 0

career value

237K/year

🤖 AI Summary

Visual Reinforcement Learning (VRL) exhibits adversarial vulnerability under image inputs, yet existing black-box attacks suffer from prohibitive environment query costs in continuous-control tasks due to high-dimensional action spaces. This paper proposes a sample-efficient black-box attack framework that jointly addresses these challenges. First, it constructs a shadow Q-network to estimate cumulative returns under adversarial states, and leverages a learned world model to simulate environment dynamics—significantly reducing real-world interactions. Second, it employs a generative adversarial network to produce visually imperceptible input perturbations. Crucially, we introduce a novel two-stage iterative training scheme that co-optimizes perturbation generation and policy evaluation. Evaluated on MuJoCo and Atari benchmarks, our method substantially degrades the target agent’s cumulative reward while reducing environment queries by over 90%, outperforming state-of-the-art black-box attacks and even several white-box alternatives.

Technology Category

Application Category

📝 Abstract

Visual reinforcement learning has achieved remarkable progress in visual control and robotics, but its vulnerability to adversarial perturbations remains underexplored. Most existing black-box attacks focus on vector-based or discrete-action RL, and their effectiveness on image-based continuous control is limited by the large action space and excessive environment queries. We propose SEBA, a sample-efficient framework for black-box adversarial attacks on visual RL agents. SEBA integrates a shadow Q model that estimates cumulative rewards under adversarial conditions, a generative adversarial network that produces visually imperceptible perturbations, and a world model that simulates environment dynamics to reduce real-world queries. Through a two-stage iterative training procedure that alternates between learning the shadow model and refining the generator, SEBA achieves strong attack performance while maintaining efficiency. Experiments on MuJoCo and Atari benchmarks show that SEBA significantly reduces cumulative rewards, preserves visual fidelity, and greatly decreases environment interactions compared to prior black-box and white-box methods.

Problem

Research questions and friction points this paper is trying to address.

Addresses vulnerability of visual reinforcement learning to adversarial attacks

Overcomes limitations of black-box attacks in continuous control settings

Reduces environment queries while maintaining effective attack performance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Shadow Q model estimates rewards under adversarial conditions

GAN generates imperceptible perturbations for visual attacks

World model reduces environment queries through simulation

🔎 Similar Papers

Robust Deep Reinforcement Learning against Adversarial Behavior Manipulation