OD-VIRAT: A Large-Scale Benchmark for Object Detection in Realistic Surveillance Environments

📅 2025-07-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Real-world surveillance scenarios lack large-scale, diverse object detection benchmarks, hindering robust algorithm development and evaluation. Method: This paper introduces OD-VIRAT, comprising two benchmarks—Large and Tiny—covering ten complex surveillance场景 with nearly 8.7 million high-precision bounding-box annotations, making it among the largest real-world surveillance detection datasets to date. It establishes a large-scale, fine-grained annotation paradigm tailored to practical surveillance requirements. Contribution/Results: The work presents the first systematic benchmarking of mainstream detectors—including RETMDet, YOLOX, RetinaNet, DETR, and Deformable-DETR—revealing critical performance bottlenecks in detecting small objects, handling severe occlusion, and operating under cluttered backgrounds. By filling a key gap in standardized evaluation for surveillance detection, OD-VIRAT provides essential data infrastructure and empirical evidence to advance robust, deployable detection algorithms.

Technology Category

Application Category

📝 Abstract
Realistic human surveillance datasets are crucial for training and evaluating computer vision models under real-world conditions, facilitating the development of robust algorithms for human and human-interacting object detection in complex environments. These datasets need to offer diverse and challenging data to enable a comprehensive assessment of model performance and the creation of more reliable surveillance systems for public safety. To this end, we present two visual object detection benchmarks named OD-VIRAT Large and OD-VIRAT Tiny, aiming at advancing visual understanding tasks in surveillance imagery. The video sequences in both benchmarks cover 10 different scenes of human surveillance recorded from significant height and distance. The proposed benchmarks offer rich annotations of bounding boxes and categories, where OD-VIRAT Large has 8.7 million annotated instances in 599,996 images and OD-VIRAT Tiny has 288,901 annotated instances in 19,860 images. This work also focuses on benchmarking state-of-the-art object detection architectures, including RETMDET, YOLOX, RetinaNet, DETR, and Deformable-DETR on this object detection-specific variant of VIRAT dataset. To the best of our knowledge, it is the first work to examine the performance of these recently published state-of-the-art object detection architectures on realistic surveillance imagery under challenging conditions such as complex backgrounds, occluded objects, and small-scale objects. The proposed benchmarking and experimental settings will help in providing insights concerning the performance of selected object detection models and set the base for developing more efficient and robust object detection architectures.
Problem

Research questions and friction points this paper is trying to address.

Develops benchmarks for object detection in surveillance environments
Evaluates state-of-the-art models on complex real-world conditions
Addresses challenges like occlusion and small-scale object detection
Innovation

Methods, ideas, or system contributions that make the work stand out.

Large-scale surveillance benchmarks OD-VIRAT
Rich annotations for diverse object detection
Benchmarks state-of-the-art detection architectures
🔎 Similar Papers
No similar papers found.