Cumulative Consensus Score: Label-Free and Model-Agnostic Evaluation of Object Detectors in Deployment

📅 2025-09-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In real-world deployments, the absence of ground-truth annotations impedes reliable monitoring and comparative evaluation of object detection models. To address this, we propose the Cumulative Consensus Score (CCS), a label-free, model-agnostic online evaluation metric. CCS generates multiple augmented views of each test sample, computes spatial consensus among predicted bounding boxes via IoU-based overlap analysis, normalizes scores using maximum pairwise overlap, and accumulates reliability estimates across detections. It enables fine-grained, scene-level performance assessment and is compatible with both one-stage and two-stage detectors, supporting DevOps-style continuous monitoring. Experiments on Open Images and KITTI demonstrate that CCS achieves over 90% correlation with supervised metrics—including F1-score, PDQ, and OCC—while robustly identifying low-performance scenes. CCS exhibits strong robustness to annotation noise and practical utility in production environments.

Technology Category

Application Category

📝 Abstract
Evaluating object detection models in deployment is challenging because ground-truth annotations are rarely available. We introduce the Cumulative Consensus Score (CCS), a label-free metric that enables continuous monitoring and comparison of detectors in real-world settings. CCS applies test-time data augmentation to each image, collects predicted bounding boxes across augmented views, and computes overlaps using Intersection over Union. Maximum overlaps are normalized and averaged across augmentation pairs, yielding a measure of spatial consistency that serves as a proxy for reliability without annotations. In controlled experiments on Open Images and KITTI, CCS achieved over 90% congruence with F1-score, Probabilistic Detection Quality, and Optimal Correction Cost. The method is model-agnostic, working across single-stage and two-stage detectors, and operates at the case level to highlight under-performing scenarios. Altogether, CCS provides a robust foundation for DevOps-style monitoring of object detectors.
Problem

Research questions and friction points this paper is trying to address.

Evaluating object detectors without ground-truth labels
Measuring spatial consistency through test-time augmentation
Providing model-agnostic reliability assessment in deployment
Innovation

Methods, ideas, or system contributions that make the work stand out.

Label-free metric for detector evaluation
Test-time data augmentation for spatial consistency
Model-agnostic method across detector types
🔎 Similar Papers
No similar papers found.
A
Avinaash Manoharan
DLR Institute of Systems Engineering for Future Mobility, Germany
X
Xiangyu Yin
Chalmers University of Technology, Sweden
D
Domenik Helm
DLR Institute of Systems Engineering for Future Mobility, Germany
Chih-Hong Cheng
Chih-Hong Cheng
Carl von Ossietzky University of Oldenburg & Chalmers University of Technology
AI safetysoftware engineeringformal methods