Cumulative Consensus Score: Label-Free and Model-Agnostic Evaluation of Object Detectors in Deployment

📅 2025-09-16

📈 Citations: 0

✨ Influential: 0

career value

190K/year

🤖 AI Summary

In real-world deployments, the absence of ground-truth annotations impedes reliable monitoring and comparative evaluation of object detection models. To address this, we propose the Cumulative Consensus Score (CCS), a label-free, model-agnostic online evaluation metric. CCS generates multiple augmented views of each test sample, computes spatial consensus among predicted bounding boxes via IoU-based overlap analysis, normalizes scores using maximum pairwise overlap, and accumulates reliability estimates across detections. It enables fine-grained, scene-level performance assessment and is compatible with both one-stage and two-stage detectors, supporting DevOps-style continuous monitoring. Experiments on Open Images and KITTI demonstrate that CCS achieves over 90% correlation with supervised metrics—including F1-score, PDQ, and OCC—while robustly identifying low-performance scenes. CCS exhibits strong robustness to annotation noise and practical utility in production environments.

Technology Category

Application Category

📝 Abstract

Evaluating object detection models in deployment is challenging because ground-truth annotations are rarely available. We introduce the Cumulative Consensus Score (CCS), a label-free metric that enables continuous monitoring and comparison of detectors in real-world settings. CCS applies test-time data augmentation to each image, collects predicted bounding boxes across augmented views, and computes overlaps using Intersection over Union. Maximum overlaps are normalized and averaged across augmentation pairs, yielding a measure of spatial consistency that serves as a proxy for reliability without annotations. In controlled experiments on Open Images and KITTI, CCS achieved over 90% congruence with F1-score, Probabilistic Detection Quality, and Optimal Correction Cost. The method is model-agnostic, working across single-stage and two-stage detectors, and operates at the case level to highlight under-performing scenarios. Altogether, CCS provides a robust foundation for DevOps-style monitoring of object detectors.

Problem

Research questions and friction points this paper is trying to address.

Evaluating object detectors without ground-truth labels

Measuring spatial consistency through test-time augmentation

Providing model-agnostic reliability assessment in deployment

Innovation

Methods, ideas, or system contributions that make the work stand out.

Label-free metric for detector evaluation

Test-time data augmentation for spatial consistency

Model-agnostic method across detector types

🔎 Similar Papers

Surveying the Landscape of Image Captioning Evaluation: A Comprehensive Taxonomy, Trends and Metrics Analysis