LiDAR MOT-DETR: A LiDAR-based Two-Stage Transformer for 3D Multiple Object Tracking

📅 2025-05-19

📈 Citations: 0

✨ Influential: 0

career value

159K/year

🤖 AI Summary

To address identity fragmentation in 3D multi-object tracking (MOT) with sparse, irregular LiDAR point clouds under crowded and fast-motion scenarios, this paper proposes the first LiDAR-specific two-stage Transformer-based tracking framework. The first stage employs a sliding temporal window to smooth detection outputs, enhancing temporal robustness; the second stage leverages a DETR-style attention mechanism to model point-cloud context and perform cross-frame trajectory association. By decoupling detection refinement from trajectory maintenance, the method supports both online and offline inference modes and introduces, for the first time in pure-LiDAR tracking, a forward-looking (look-ahead) mechanism. On the nuScenes benchmark, the online mode achieves an aMOTA of 0.722 and an aMOTP of 0.475—substantially outperforming prior methods—while the offline mode further improves aMOTP by 3 percentage points.

Technology Category

Application Category

📝 Abstract

Multi-object tracking from LiDAR point clouds presents unique challenges due to the sparse and irregular nature of the data, compounded by the need for temporal coherence across frames. Traditional tracking systems often rely on hand-crafted features and motion models, which can struggle to maintain consistent object identities in crowded or fast-moving scenes. We present a lidar-based two-staged DETR inspired transformer; a smoother and tracker. The smoother stage refines lidar object detections, from any off-the-shelf detector, across a moving temporal window. The tracker stage uses a DETR-based attention block to maintain tracks across time by associating tracked objects with the refined detections using the point cloud as context. The model is trained on the datasets nuScenes and KITTI in both online and offline (forward peeking) modes demonstrating strong performance across metrics such as ID-switch and multiple object tracking accuracy (MOTA). The numerical results indicate that the online mode outperforms the lidar-only baseline and SOTA models on the nuScenes dataset, with an aMOTA of 0.722 and an aMOTP of 0.475, while the offline mode provides an additional 3 pp aMOTP

Problem

Research questions and friction points this paper is trying to address.

Tracking multiple objects in sparse LiDAR point clouds

Maintaining consistent object identities in dynamic scenes

Improving 3D tracking accuracy with transformer-based models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Two-stage transformer for LiDAR tracking

Smoother refines detections temporally

DETR-based tracker maintains object identities

🔎 Similar Papers

No similar papers found.