SOTFormer: A Minimal Transformer for Unified Object Tracking and Trajectory Prediction

📅 2025-11-14

📈 Citations: 0

✨ Influential: 0

career value

143K/year

🤖 AI Summary

To address temporal inconsistency in single-object tracking and short-term trajectory prediction under occlusion, scale variation, and temporal drift, this paper proposes a lightweight constant-memory temporal Transformer framework that unifies tracking, detection, and short-horizon prediction. Key contributions include: (1) a ground-truth-prioritized memory module enabling stable identity propagation within a single-layer temporal attention; (2) a burn-in anchoring loss ensuring robust initialization; and (3) an end-to-end trainable architecture integrating a fixed-size memory buffer, lightweight attention, and contrastive learning for real-time inference. Evaluated on Mini-LaSOT (20%), the method achieves 76.3 AUC and 53.7 FPS with only 4.3 GB GPU memory—significantly outperforming TrackFormer and MOTRv2, especially in challenging scenarios involving rapid motion, large-scale variation, and severe occlusion.

Technology Category

Application Category

📝 Abstract

Accurate single-object tracking and short-term motion forecasting remain challenging under occlusion, scale variation, and temporal drift, which disrupt the temporal coherence required for real-time perception. We introduce extbf{SOTFormer}, a minimal constant-memory temporal transformer that unifies object detection, tracking, and short-horizon trajectory prediction within a single end-to-end framework. Unlike prior models with recurrent or stacked temporal encoders, SOTFormer achieves stable identity propagation through a ground-truth-primed memory and a burn-in anchor loss that explicitly stabilizes initialization. A single lightweight temporal-attention layer refines embeddings across frames, enabling real-time inference with fixed GPU memory. On the Mini-LaSOT (20%) benchmark, SOTFormer attains 76.3 AUC and 53.7 FPS (AMP, 4.3 GB VRAM), outperforming transformer baselines such as TrackFormer and MOTRv2 under fast motion, scale change, and occlusion.

Problem

Research questions and friction points this paper is trying to address.

Unified object tracking and trajectory prediction under occlusion

Maintaining temporal coherence during scale variation and drift

Real-time performance with minimal memory consumption

Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified end-to-end framework for tracking and prediction

Ground-truth-primed memory with burn-in anchor loss

Single lightweight temporal-attention layer for real-time inference

🔎 Similar Papers

No similar papers found.