Unified People Tracking with Graph Neural Networks

📅 2025-07-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing multi-object tracking (MOT) methods suffer from reliance on pre-computed trajectory fragments and insufficient occlusion robustness due to rigid detection-to-association pipelines. To address this, we propose the first end-to-end differentiable unified tracking framework: it dynamically constructs a spatiotemporal graph to explicitly model cross-frame associations among detection nodes, jointly fuses spatial, temporal, and multimodal contextual cues, and incorporates a learnable, scene-aware graph topology to enhance occlusion handling. To support this research, we introduce the first large-scale multi-view tracking dataset featuring 25 partially overlapping camera views, high-fidelity 3D reconstructions, and dense occlusions. Our method achieves state-of-the-art performance on multiple public benchmarks and the new dataset, demonstrating significantly improved cross-scene generalization. All code, models, and the dataset will be publicly released.

Technology Category

Application Category

📝 Abstract
This work presents a unified, fully differentiable model for multi-people tracking that learns to associate detections into trajectories without relying on pre-computed tracklets. The model builds a dynamic spatiotemporal graph that aggregates spatial, contextual, and temporal information, enabling seamless information propagation across entire sequences. To improve occlusion handling, the graph can also encode scene-specific information. We also introduce a new large-scale dataset with 25 partially overlapping views, detailed scene reconstructions, and extensive occlusions. Experiments show the model achieves state-of-the-art performance on public benchmarks and the new dataset, with flexibility across diverse conditions. Both the dataset and approach will be publicly released to advance research in multi-people tracking.
Problem

Research questions and friction points this paper is trying to address.

Develops a differentiable model for multi-people tracking without pre-computed tracklets
Uses dynamic spatiotemporal graph to aggregate spatial, contextual, temporal information
Introduces large-scale dataset with overlapping views and occlusions for evaluation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic spatiotemporal graph for tracking
Scene-specific occlusion handling encoding
Large-scale multi-view dataset introduction
🔎 Similar Papers
No similar papers found.