Beyond Hungarian: Match-Free Supervision for End-to-End Object Detection

📅 2026-03-09

📈 Citations: 0

✨ Influential: 0

career value

165K/year

🤖 AI Summary

This work proposes an end-to-end object detection training framework that eliminates the need for explicit bipartite matching between queries and ground-truth bounding boxes, a process traditionally reliant on the Hungarian algorithm and associated with high computational overhead and dynamic complexity. Instead of discrete, heuristic assignment, the method employs a differentiable implicit correspondence learning mechanism. The key innovation is a Cross-Attention-based Query Selection (CAQS) module, which leverages encoded ground-truth box information to guide decoder query learning. This approach substantially simplifies the training pipeline, reduces matching latency by over 50%, and significantly improves training efficiency while achieving superior detection performance compared to current state-of-the-art methods.

Technology Category

Application Category

📝 Abstract

Recent DEtection TRansformer (DETR) based frameworks have achieved remarkable success in end-to-end object detection. However, the reliance on the Hungarian algorithm for bipartite matching between queries and ground truths introduces computational overhead and complicates the training dynamics. In this paper, we propose a novel matching-free training scheme for DETR-based detectors that eliminates the need for explicit heuristic matching. At the core of our approach is a dedicated Cross-Attention-based Query Selection (CAQS) module. Instead of discrete assignment, we utilize encoded ground-truth information to probe the decoder queries through a cross-attention mechanism. By minimizing the weighted error between the queried results and the ground truths, the model autonomously learns the implicit correspondences between object queries and specific targets. This learned relationship further provides supervision signals for the learning of queries. Experimental results demonstrate that our proposed method bypasses the traditional matching process, significantly enhancing training efficiency, reducing the matching latency by over 50\%, effectively eliminating the discrete matching bottleneck through differentiable correspondence learning, and also achieving superior performance compared to existing state-of-the-art methods.

Problem

Research questions and friction points this paper is trying to address.

object detection

bipartite matching

Hungarian algorithm

DETR

training efficiency

Innovation

Methods, ideas, or system contributions that make the work stand out.

matching-free

DETR

cross-attention