🤖 AI Summary
Existing autonomous driving anomaly detection methods suffer from poor generalizability, weak robustness to sensor noise, and high annotation costs due to reliance on hand-crafted thresholds or supervised learning. To address these limitations, we propose TRAP: a fully unsupervised framework that implicitly infers driving intent from perception sequences via inverse reinforcement learning, jointly modeling trajectories and reward functions. TRAP introduces worst-case supervision for temporal credit assignment and employs variable-horizon pretraining to maximize the early-warning window prior to anomaly occurrence—without requiring any anomaly labels. Evaluated on 14,000 simulated trajectories, TRAP achieves an AUC of 0.90 and an F1-score of 82.2%, outperforming baselines by 39% in recall and 12% in F1. It maintains robust performance under sensor noise, occlusion, and unseen anomaly types.
📝 Abstract
Anomaly detection plays a critical role in Autonomous Vehicles (AVs) by identifying unusual behaviors through perception systems that could compromise safety and lead to hazardous situations. Current approaches, which often rely on predefined thresholds or supervised learning paradigms, exhibit reduced efficacy when confronted with unseen scenarios, sensor noise, and occlusions, leading to potential safety-critical failures. Moreover, supervised methods require large annotated datasets, limiting their real-world feasibility. To address these gaps, we propose an anomaly detection framework based on Inverse Reinforcement Learning (IRL) to infer latent driving intentions from sequential perception data, thus enabling robust identification. Specifically, we present Trajectory-Reward Guided Adaptive Pre-training (TRAP), a novel IRL framework for anomaly detection, to address two critical limitations of existing methods: noise robustness and generalization to unseen scenarios. Our core innovation is implicitly learning temporal credit assignments via reward and worst-case supervision. We leverage pre-training with variable-horizon sampling to maximize time-to-consequence, resulting in early detection of behavior deviation. Experiments on 14,000+ simulated trajectories demonstrate state-of-the-art performance, achieving 0.90 AUC and 82.2% F1-score - outperforming similarly trained supervised and unsupervised baselines by 39% on Recall and 12% on F1-score, respectively. Similar performance is achieved while exhibiting robustness to various noise types and generalization to unseen anomaly types. Our code will be available at: https://github.com/abastola0/TRAP.git