Siamese-DETR for Generic Multi-Object Tracking

๐Ÿ“… 2023-10-27
๐Ÿ›๏ธ IEEE Transactions on Image Processing
๐Ÿ“ˆ Citations: 4
โœจ Influential: 1
๐Ÿ“„ PDF
๐Ÿค– AI Summary
To address the challenge of General Multi-Object Tracking (GMOT) in open-set scenarios, this paper proposes an end-to-end tracking framework that requires no category priorsโ€”only template images and generic detection data (e.g., COCO). Methodologically, it introduces the first template-driven paradigm based on DETRโ€™s object queries: (i) multi-scale templated queries are designed to enhance feature matching robustness; (ii) a dynamic matching training strategy replaces static label assignment; and (iii) a query propagation mechanism enables online tracking, eliminating conventional data association and non-maximum suppression (NMS). The framework eschews language models and fine-grained category annotations, significantly improving generalization to unseen classes. Evaluated on the GMOT-40 benchmark, it substantially outperforms existing methods, achieving efficient and robust open-set GMOT.
๐Ÿ“ Abstract
The ability to detect and track the dynamic objects in different scenes is fundamental to real-world applications, e.g., autonomous driving and robot navigation. However, traditional Multi-Object Tracking (MOT) is limited to track objects belonging to the pre-defined closed-set categories. Recently, Generic MOT (GMOT) is proposed to track interested objects beyond pre-defined categories and it can be divided into Open-Vocabulary MOT (OVMOT) and Template-Image-based MOT (TIMOT). Taking the consideration that the expensive well pre-trained (vision-)language model and fine-grained category annotations are required to train OVMOT models, in this paper, we focus on TIMOT and propose a simple but effective method, Siamese-DETR. Only the commonly used detection datasets (e.g., COCO) are required for training. Different from existing TIMOT methods, which train a Single Object Tracking (SOT) based detector to detect interested objects and then apply a data association based MOT tracker to get the trajectories, we leverage the inherent object queries in DETR variants. Specifically: 1) The multi-scale object queries are designed based on the given template image, which are effective for detecting different scales of objects with the same category as the template image; 2) A dynamic matching training strategy is introduced to train Siamese-DETR on commonly used detection datasets, which takes full advantage of provided annotations; 3) The online tracking pipeline is simplified through a tracking-by-query manner by incorporating the tracked boxes in the previous frame as additional query boxes. The complex data association is replaced with the much simpler Non-Maximum Suppression (NMS). Extensive experimental results show that Siamese-DETR surpasses existing MOT methods on GMOT-40 dataset by a large margin.
Problem

Research questions and friction points this paper is trying to address.

Multi-category Object Tracking
Autonomous Driving
Robot Navigation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Siamese-DETR
query-based approach
efficient training strategy
๐Ÿ”Ž Similar Papers
No similar papers found.
Q
Qiankun Liu
School of Computer Science and Technology, Beijing Institute of Technology
Y
Yichen Li
School of Computer Science and Technology, Beijing Institute of Technology
Y
Yuqi Jiang
School of Computer Science and Technology, Beijing Institute of Technology
Y
Ying Fu
School of Computer Science and Technology, Beijing Institute of Technology