DepTR-MOT: Unveiling the Potential of Depth-Informed Trajectory Refinement for Multi-Object Tracking

📅 2025-09-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address degraded multi-object tracking (MOT) performance in robotic scenes characterized by high target density and frequent occlusions, this paper proposes a depth-aware trajectory optimization method. Unlike mainstream tracking-by-detection (TBD) approaches relying solely on 2D bounding boxes and motion modeling, our work is the first to incorporate instance-level depth information to enhance both detection and data association. Specifically, we leverage foundation models to generate soft depth labels for supervision and employ dense depth map distillation to ensure global geometric consistency—enabling accurate, inference-time depth estimation with zero additional computational overhead. Built upon the DETR architecture, our framework integrates instance depth prediction, soft-label supervision, knowledge distillation, and depth-map alignment modules. On QuadTrack and DanceTrack, our method achieves HOTA scores of 27.59 and 44.47, respectively—substantially outperforming existing TBD methods, particularly under severe occlusion conditions prevalent in robotic environments.

Technology Category

Application Category

📝 Abstract
Visual Multi-Object Tracking (MOT) is a crucial component of robotic perception, yet existing Tracking-By-Detection (TBD) methods often rely on 2D cues, such as bounding boxes and motion modeling, which struggle under occlusions and close-proximity interactions. Trackers relying on these 2D cues are particularly unreliable in robotic environments, where dense targets and frequent occlusions are common. While depth information has the potential to alleviate these issues, most existing MOT datasets lack depth annotations, leading to its underexploited role in the domain. To unveil the potential of depth-informed trajectory refinement, we introduce DepTR-MOT, a DETR-based detector enhanced with instance-level depth information. Specifically, we propose two key innovations: (i) foundation model-based instance-level soft depth label supervision, which refines depth prediction, and (ii) the distillation of dense depth maps to maintain global depth consistency. These strategies enable DepTR-MOT to output instance-level depth during inference, without requiring foundation models and without additional computational cost. By incorporating depth cues, our method enhances the robustness of the TBD paradigm, effectively resolving occlusion and close-proximity challenges. Experiments on both the QuadTrack and DanceTrack datasets demonstrate the effectiveness of our approach, achieving HOTA scores of 27.59 and 44.47, respectively. In particular, results on QuadTrack, a robotic platform MOT dataset, highlight the advantages of our method in handling occlusion and close-proximity challenges in robotic tracking. The source code will be made publicly available at https://github.com/warriordby/DepTR-MOT.
Problem

Research questions and friction points this paper is trying to address.

Improving multi-object tracking robustness using depth information
Solving occlusion challenges in robotic perception environments
Addressing close-proximity interaction difficulties in 2D-based trackers
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses instance-level soft depth supervision
Distills dense depth maps globally
Enhances DETR-based detector with depth cues
🔎 Similar Papers
No similar papers found.
B
Buyin Deng
School of Artificial Intelligence and Robotics, Hunan University, China
L
Lingxin Huang
School of Artificial Intelligence and Robotics, Hunan University, China
K
Kai Luo
School of Artificial Intelligence and Robotics, Hunan University, China
Fei Teng
Fei Teng
Reader in Intelligent Energy Systems, Imperial College London
Stability-constrained OptimisationCyber-resilient System OperationData Privacy and Trading
Kailun Yang
Kailun Yang
Professor. School of Artificial Intelligence and Robotics, Hunan University (HNU); KIT; UAH; ZJU
Computer VisionComputational OpticsIntelligent VehiclesAutonomous DrivingRobotics