🤖 AI Summary
Visual Reinforcement Learning (VRL) exhibits adversarial vulnerability under image inputs, yet existing black-box attacks suffer from prohibitive environment query costs in continuous-control tasks due to high-dimensional action spaces. This paper proposes a sample-efficient black-box attack framework that jointly addresses these challenges. First, it constructs a shadow Q-network to estimate cumulative returns under adversarial states, and leverages a learned world model to simulate environment dynamics—significantly reducing real-world interactions. Second, it employs a generative adversarial network to produce visually imperceptible input perturbations. Crucially, we introduce a novel two-stage iterative training scheme that co-optimizes perturbation generation and policy evaluation. Evaluated on MuJoCo and Atari benchmarks, our method substantially degrades the target agent’s cumulative reward while reducing environment queries by over 90%, outperforming state-of-the-art black-box attacks and even several white-box alternatives.
📝 Abstract
Visual reinforcement learning has achieved remarkable progress in visual control and robotics, but its vulnerability to adversarial perturbations remains underexplored. Most existing black-box attacks focus on vector-based or discrete-action RL, and their effectiveness on image-based continuous control is limited by the large action space and excessive environment queries. We propose SEBA, a sample-efficient framework for black-box adversarial attacks on visual RL agents. SEBA integrates a shadow Q model that estimates cumulative rewards under adversarial conditions, a generative adversarial network that produces visually imperceptible perturbations, and a world model that simulates environment dynamics to reduce real-world queries. Through a two-stage iterative training procedure that alternates between learning the shadow model and refining the generator, SEBA achieves strong attack performance while maintaining efficiency. Experiments on MuJoCo and Atari benchmarks show that SEBA significantly reduces cumulative rewards, preserves visual fidelity, and greatly decreases environment interactions compared to prior black-box and white-box methods.