🤖 AI Summary
This work addresses the critical limitation of existing 2D object detectors—their lack of effective runtime introspection capability to anticipate detection failures, which poses significant safety risks in autonomous driving systems. To this end, the authors propose a lightweight introspection method that, for the first time, incorporates a cross-level feature attention mechanism to dynamically aggregate multi-layer features from the backbone network. The model learns, in an end-to-end manner, adaptive weights that reflect each layer’s importance for failure prediction. The approach simultaneously enhances detection performance and offers interpretability, achieving state-of-the-art introspection results on both KITTI and BDD100K benchmarks. It substantially outperforms single-layer feature baselines and can be flexibly integrated into various modern detection architectures.
📝 Abstract
Reliable object detection is critical for automated driving, yet even state-of-the-art detectors inevitably make errors that can compromise safety. Introspection methods that predict detector failures enable safer deployment by triggering fallback mechanisms or alerting human operators. However, existing approaches rely solely on last-layer features or hand-crafted statistics, discarding valuable information from earlier layers that capture different levels of visual abstraction. We propose Layer Feature Attention (LFA), a lightweight introspection method that learns to aggregate features from multiple backbone layers through an attention mechanism. Our key insight is that detection errors manifest differently across feature hierarchies: low-level layers capture fine-grained details essential for detecting small or occluded objects, while high-level layers encode semantic information for scene understanding. LFA learns layer importance weights end-to-end, enabling both improved error prediction and interpretable analysis of which feature levels are most indicative of detector failures. Extensive experiments on KITTI and BDD100K demonstrate that LFA achieves state-of-the-art introspection performance, outperforming single-layer baselines across multiple detector architectures.