🤖 AI Summary
Existing traffic atomic activity datasets predominantly adopt front-view perspectives and video-level annotations, limiting their applicability to holistic intersection dynamics modeling and precluding fine-grained temporal analysis on untrimmed videos. To address this, we propose ATARS—the first aerial-view, frame-level, multi-label dataset designed for comprehensive intersection-level traffic atomic activity analysis. We formally introduce the novel task of *multi-label temporal atomic activity recognition*, curate a large-scale aerial video corpus with meticulous manual frame-level annotations, and design an enhanced evaluation protocol tailored for small-object detection and recognition. Benchmarking state-of-the-art models—including MS-TCN and TADTR—reveals significant performance degradation under scale variation, occlusion, and concurrent multi-activity scenarios. ATARS establishes a reproducible benchmark for traffic understanding and exposes critical challenges for future research.
📝 Abstract
Traffic Atomic Activity which describes traffic patterns for topological intersection dynamics is a crucial topic for the advancement of intelligent driving systems. However, existing atomic activity datasets are collected from an egocentric view, which cannot support the scenarios where traffic activities in an entire intersection are required. Moreover, existing datasets only provide video-level atomic activity annotations, which require exhausting efforts to manually trim the videos for recognition and limit their applications to untrimmed videos. To bridge this gap, we introduce the Aerial Traffic Atomic Activity Recognition and Segmentation (ATARS) dataset, the first aerial dataset designed for multi-label atomic activity analysis. We offer atomic activity labels for each frame, which accurately record the intervals for traffic activities. Moreover, we propose a novel task, Multi-label Temporal Atomic Activity Recognition, enabling the study of accurate temporal localization for atomic activity and easing the burden of manual video trimming for recognition. We conduct extensive experiments to evaluate existing state-of-the-art models on both atomic activity recognition and temporal atomic activity segmentation. The results highlight the unique challenges of our ATARS dataset, such as recognizing extremely small objects' activities. We further provide comprehensive discussion analyzing these challenges and offer valuable insights for future direction to improve recognizing atomic activity in aerial view. Our source code and dataset are available at https://github.com/magecliff96/ATARS/