🤖 AI Summary
Monocular depth estimation (MDE) models deployed in autonomous driving and robotics face emerging physical-world backdoor threats, yet existing backdoor attacks are ill-suited for depth prediction due to the continuous, dense nature of depth maps and the absence of discrete class labels.
Method: This work proposes the first object-level backdoor attack targeting depth map supervision. To inject triggers into continuous depth outputs, we design a dedicated backdoor framework compatible with regression-based supervision. We bridge the simulation-to-reality gap via digital-to-physical enhancement and adversarial trigger optimization, localize target objects using semantic segmentation, and apply depth completion to reconstruct background regions—ensuring visual stealth.
Contribution/Results: Evaluated on multiple state-of-the-art MDE models, our attack achieves >92% attack success rate while maintaining strong robustness against real-world perturbations such as illumination changes and viewpoint variations. This significantly advances the practical feasibility and threat severity of backdoor attacks in depth perception tasks.
📝 Abstract
In recent years, deep learning-based Monocular Depth Estimation (MDE) models have been widely applied in fields such as autonomous driving and robotics. However, their vulnerability to backdoor attacks remains unexplored. To fill the gap in this area, we conduct a comprehensive investigation of backdoor attacks against MDE models. Typically, existing backdoor attack methods can not be applied to MDE models. This is because the label used in MDE is in the form of a depth map. To address this, we propose BadDepth, the first backdoor attack targeting MDE models. BadDepth overcomes this limitation by selectively manipulating the target object's depth using an image segmentation model and restoring the surrounding areas via depth completion, thereby generating poisoned datasets for object-level backdoor attacks. To improve robustness in physical world scenarios, we further introduce digital-to-physical augmentation to adapt to the domain gap between the physical world and the digital domain. Extensive experiments on multiple models validate the effectiveness of BadDepth in both the digital domain and the physical world, without being affected by environmental factors.