🤖 AI Summary
Adversarial attacks on stereo depth estimation (SDE) models face practical challenges in real-world deployment, including scene adaptivity, strong transferability, and hardware compatibility—constraints unaddressed by gradient-based optimization methods.
Method: We propose PatchHunter, the first gradient-free, physically realizable adversarial patch attack framework for SDE. Leveraging reinforcement learning, it searches a structured visual pattern library to synthesize semantic-level adversarial patches—demonstrating that high-level pattern features are more effective than pixel-wise perturbations. PatchHunter integrates stereo matching’s four-stage pipeline with photometric consistency constraints to ensure physical plausibility and robustness.
Results: Evaluated on KITTI, CARLA, and real-world vehicle platforms, PatchHunter achieves superior black-box transfer success rates over optimization-based approaches. Even under low-light conditions, it maintains D1-all error below 0.4, confirming its robustness, generalizability, and deployability in realistic settings.
📝 Abstract
Stereo Depth Estimation (SDE) is essential for scene understanding in vision-based systems like autonomous driving. However, recent studies show that SDE models are vulnerable to adversarial attacks, which are often limited to unrealistic settings, e.g., digital perturbations on separate stereo views in static scenes, restricting their real-world applicability. This raises a critical question: how can we design physically realizable, scene-adaptive, and transferable attacks against SDE under realistic constraints?
To answer this, we make two key contributions. First, we propose a unified attack framework that extends optimization-based techniques to four core stages of stereo matching: feature extraction, cost-volume construction, cost aggregation, and disparity regression. A comprehensive stage-wise evaluation across 9 mainstream SDE models, under constraints like photometric consistency, reveals that optimization-based patches suffer from poor transferability. Interestingly, partially transferable patches suggest that patterns, rather than pixel-level perturbations, may be key to generalizable attacks. Motivated by this, we present PatchHunter, the first optimization-free adversarial patch attack against SDE. PatchHunter formulates patch generation as a reinforcement learning-driven search over a structured space of visual patterns crafted to disrupt SDE assumptions.
We validate PatchHunter across three levels: the KITTI dataset, the CARLA simulator, and real-world vehicle deployment. PatchHunter not only surpasses optimization-based methods in effectiveness but also achieves significantly better black-box transferability. Even under challenging physical conditions like low light, PatchHunter maintains high attack success (e.g., D1-all > 0.4), whereas optimization-based methods fail.