On the Fragility of Multimodal Perception to Temporal Misalignment in Autonomous Driving

📅 2025-07-11

📈 Citations: 0

✨ Influential: 0

career value

235K/year

🤖 AI Summary

This paper reveals the severe vulnerability of autonomous driving multimodal perception (camera + LiDAR) to sensor temporal misalignment. We propose DejaVu, a novel attack that degrades detection and tracking performance by introducing only millisecond-level cross-modal delays—e.g., delaying a single LiDAR frame reduces mAP by 88.5%, while delaying three camera frames drops MOTA by 73%. To counter this threat, we design AION, a lightweight, real-time defense mechanism that models cross-modal temporal alignment paths via multimodal shared representations and dynamic time warping, computing anomaly scores based on temporal consistency. AION is the first unsupervised, model-agnostic detector leveraging cross-modal temporal consistency for attack identification. Evaluated across multiple datasets and architectures, AION achieves AUROC scores of 0.92–0.98, demonstrating high generalizability, robustness, and deployment feasibility.

Technology Category

Application Category

📝 Abstract

Multimodal fusion (MMF) plays a critical role in the perception of autonomous driving, which primarily fuses camera and LiDAR streams for a comprehensive and efficient scene understanding. However, its strict reliance on precise temporal synchronization exposes it to new vulnerabilities. In this paper, we introduce DejaVu, a novel attack that exploits network-induced delays to create subtle temporal misalignments across sensor streams, severely degrading downstream MMF-based perception tasks. Our comprehensive attack analysis across different models and datasets reveals these sensors' task-specific imbalanced sensitivities: object detection is overly dependent on LiDAR inputs while object tracking is highly reliant on the camera inputs. Consequently, with a single-frame LiDAR delay, an attacker can reduce the car detection mAP by up to 88.5%, while with a three-frame camera delay, multiple object tracking accuracy (MOTA) for car drops by 73%. To detect such attacks, we propose AION, a defense patch that can work alongside the existing perception model to monitor temporal alignment through cross-modal temporal consistency. AION leverages multimodal shared representation learning and dynamic time warping to determine the path of temporal alignment and calculate anomaly scores based on the alignment. Our thorough evaluation of AION shows it achieves AUROC scores of 0.92-0.98 with low false positives across datasets and model architectures, demonstrating it as a robust and generalized defense against the temporal misalignment attacks.

Problem

Research questions and friction points this paper is trying to address.

Exploits delays to disrupt multimodal sensor synchronization in autonomous driving

Reveals task-specific sensor vulnerabilities in object detection and tracking

Proposes defense to detect temporal misalignment attacks via cross-modal consistency

Innovation

Methods, ideas, or system contributions that make the work stand out.

DejaVu attack exploits network delays for misalignment

AION detects attacks via cross-modal temporal consistency

Dynamic time warping calculates alignment anomaly scores

🔎 Similar Papers

Robust Long-Range Perception Against Sensor Misalignment in Autonomous Vehicles