🤖 AI Summary
Existing Siamese-based UAV tracking methods suffer from insufficient long-term robustness due to their neglect of temporal dependencies and inability to model nonlinear appearance variations—challenges exacerbated by scale changes, dynamic backgrounds, clutter interference, and frequent occlusions. To address these limitations, we propose SiamTPN, a temporally aware Siamese tracking framework that, for the first time, integrates explicit temporal feature interaction and attention mechanisms into the Siamese architecture. SiamTPN employs a lightweight Transformer-based pyramid network to achieve efficient spatiotemporal consistency modeling. Extensive experiments demonstrate that our method achieves a 13.7% gain in success rate and a 14.7% improvement in precision over state-of-the-art approaches on mainstream UAV tracking benchmarks. Moreover, it attains real-time performance of 7.1 FPS on the Jetson Nano platform, striking an effective balance between robustness and deployability.
📝 Abstract
Aerial object tracking remains a challenging task due to scale variations, dynamic backgrounds, clutter, and frequent occlusions. While most existing trackers emphasize spatial cues, they often overlook temporal dependencies, resulting in limited robustness in long-term tracking and under occlusion. Furthermore, correlation-based Siamese trackers are inherently constrained by the linear nature of correlation operations, making them ineffective against complex, non-linear appearance changes. To address these limitations, we introduce T-SiamTPN, a temporal-aware Siamese tracking framework that extends the SiamTPN architecture with explicit temporal modeling. Our approach incorporates temporal feature fusion and attention-based interactions, strengthening temporal consistency and enabling richer feature representations. These enhancements yield significant improvements over the baseline and achieve performance competitive with state-of-the-art trackers. Crucially, despite the added temporal modules, T-SiamTPN preserves computational efficiency. Deployed on the resource-constrained Jetson Nano, the tracker runs in real time at 7.1 FPS, demonstrating its suitability for real-world embedded applications without notable runtime overhead. Experimental results highlight substantial gains: compared to the baseline, T-SiamTPN improves success rate by 13.7% and precision by 14.7%. These findings underscore the importance of temporal modeling in Siamese tracking frameworks and establish T-SiamTPN as a strong and efficient solution for aerial object tracking. Code is available at: https://github.com/to/be/released