π€ AI Summary
This paper reveals the severe vulnerability of autonomous driving multimodal perception (camera + LiDAR) to sensor temporal misalignment. We propose DejaVu, a novel attack that degrades detection and tracking performance by introducing only millisecond-level cross-modal delaysβe.g., delaying a single LiDAR frame reduces mAP by 88.5%, while delaying three camera frames drops MOTA by 73%. To counter this threat, we design AION, a lightweight, real-time defense mechanism that models cross-modal temporal alignment paths via multimodal shared representations and dynamic time warping, computing anomaly scores based on temporal consistency. AION is the first unsupervised, model-agnostic detector leveraging cross-modal temporal consistency for attack identification. Evaluated across multiple datasets and architectures, AION achieves AUROC scores of 0.92β0.98, demonstrating high generalizability, robustness, and deployment feasibility.
π Abstract
Multimodal fusion (MMF) plays a critical role in the perception of autonomous driving, which primarily fuses camera and LiDAR streams for a comprehensive and efficient scene understanding. However, its strict reliance on precise temporal synchronization exposes it to new vulnerabilities. In this paper, we introduce DejaVu, a novel attack that exploits network-induced delays to create subtle temporal misalignments across sensor streams, severely degrading downstream MMF-based perception tasks. Our comprehensive attack analysis across different models and datasets reveals these sensors' task-specific imbalanced sensitivities: object detection is overly dependent on LiDAR inputs while object tracking is highly reliant on the camera inputs. Consequently, with a single-frame LiDAR delay, an attacker can reduce the car detection mAP by up to 88.5%, while with a three-frame camera delay, multiple object tracking accuracy (MOTA) for car drops by 73%. To detect such attacks, we propose AION, a defense patch that can work alongside the existing perception model to monitor temporal alignment through cross-modal temporal consistency. AION leverages multimodal shared representation learning and dynamic time warping to determine the path of temporal alignment and calculate anomaly scores based on the alignment. Our thorough evaluation of AION shows it achieves AUROC scores of 0.92-0.98 with low false positives across datasets and model architectures, demonstrating it as a robust and generalized defense against the temporal misalignment attacks.