SOTFormer: A Minimal Transformer for Unified Object Tracking and Trajectory Prediction

📅 2025-11-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address temporal inconsistency in single-object tracking and short-term trajectory prediction under occlusion, scale variation, and temporal drift, this paper proposes a lightweight constant-memory temporal Transformer framework that unifies tracking, detection, and short-horizon prediction. Key contributions include: (1) a ground-truth-prioritized memory module enabling stable identity propagation within a single-layer temporal attention; (2) a burn-in anchoring loss ensuring robust initialization; and (3) an end-to-end trainable architecture integrating a fixed-size memory buffer, lightweight attention, and contrastive learning for real-time inference. Evaluated on Mini-LaSOT (20%), the method achieves 76.3 AUC and 53.7 FPS with only 4.3 GB GPU memory—significantly outperforming TrackFormer and MOTRv2, especially in challenging scenarios involving rapid motion, large-scale variation, and severe occlusion.

Technology Category

Application Category

📝 Abstract
Accurate single-object tracking and short-term motion forecasting remain challenging under occlusion, scale variation, and temporal drift, which disrupt the temporal coherence required for real-time perception. We introduce extbf{SOTFormer}, a minimal constant-memory temporal transformer that unifies object detection, tracking, and short-horizon trajectory prediction within a single end-to-end framework. Unlike prior models with recurrent or stacked temporal encoders, SOTFormer achieves stable identity propagation through a ground-truth-primed memory and a burn-in anchor loss that explicitly stabilizes initialization. A single lightweight temporal-attention layer refines embeddings across frames, enabling real-time inference with fixed GPU memory. On the Mini-LaSOT (20%) benchmark, SOTFormer attains 76.3 AUC and 53.7 FPS (AMP, 4.3 GB VRAM), outperforming transformer baselines such as TrackFormer and MOTRv2 under fast motion, scale change, and occlusion.
Problem

Research questions and friction points this paper is trying to address.

Unified object tracking and trajectory prediction under occlusion
Maintaining temporal coherence during scale variation and drift
Real-time performance with minimal memory consumption
Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified end-to-end framework for tracking and prediction
Ground-truth-primed memory with burn-in anchor loss
Single lightweight temporal-attention layer for real-time inference
🔎 Similar Papers
No similar papers found.
Z
Zhongping Dong
School of Computer Science, University College Dublin, Ireland
P
Pengyang Yu
School of Computer Science, University College Dublin, Ireland
S
Shuangjian Li
School of Computer Science and Technology, Dalian University of Technology, China
L
Liming Chen
School of Computer Science and Technology, Dalian University of Technology, China
Mohand Tahar Kechadi
Mohand Tahar Kechadi
Full Professor, University College Dublin
Big Data AnalyticsData ScienceCybersecurityDigital ForensicsCloud Computing