HiM2SAM: Enhancing SAM2 with Hierarchical Motion Estimation and Memory Optimization towards Long-term Tracking

📅 2025-07-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the insufficient long-term tracking robustness in video object tracking—caused by occlusion, background clutter, and target re-appearance—this paper proposes a lightweight, training-free enhancement framework that improves SAM2’s performance on long-term tracking tasks. The method introduces: (1) a hierarchical motion estimation strategy that integrates linear trajectory prediction with selective nonlinear refinement; and (2) an optimized long-short-term memory bank mechanism incorporating memory frame classification to better adapt to dynamic motion patterns and appearance evolution. Evaluated on LaSOT and LaSOText benchmarks, the framework achieves AUC improvements of 9.6% and 7.2%, respectively, for the large model, with even more substantial gains for the small model—setting new state-of-the-art performance in long-term tracking.

Technology Category

Application Category

📝 Abstract
This paper presents enhancements to the SAM2 framework for video object tracking task, addressing challenges such as occlusions, background clutter, and target reappearance. We introduce a hierarchical motion estimation strategy, combining lightweight linear prediction with selective non-linear refinement to improve tracking accuracy without requiring additional training. In addition, we optimize the memory bank by distinguishing long-term and short-term memory frames, enabling more reliable tracking under long-term occlusions and appearance changes. Experimental results show consistent improvements across different model scales. Our method achieves state-of-the-art performance on LaSOT and LaSOText with the large model, achieving 9.6% and 7.2% relative improvements in AUC over the original SAM2, and demonstrates even larger relative gains on smaller models, highlighting the effectiveness of our trainless, low-overhead improvements for boosting long-term tracking performance. The code is available at https://github.com/LouisFinner/HiM2SAM.
Problem

Research questions and friction points this paper is trying to address.

Enhancing SAM2 for long-term video object tracking
Addressing occlusions and target reappearance challenges
Optimizing memory for reliable tracking under changes
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical motion estimation strategy
Optimized memory bank management
Trainless low-overhead performance boost
🔎 Similar Papers
No similar papers found.