🤖 AI Summary
To address temporal discontinuities caused by long-term occlusion and target reappearance, as well as distractor interference in complex semi-supervised video object segmentation, this paper proposes the SeC framework. SeC builds upon SAM-2 to establish a synergistic mechanism integrating long-term memory and concept-aware reasoning, explicitly modeling cross-frame long-range dependencies while incorporating semantic priors. This enables robust re-identification of occluded targets and effective suppression of distractors. By preserving efficient dynamic tracking, SeC simultaneously enhances semantic consistency across frames, significantly improving segmentation robustness in challenging scenarios. Evaluated on the MOSEv2 Challenge test set, SeC achieves a J&F score of 39.89%, ranking first—demonstrating the effectiveness and state-of-the-art capability of its temporal modeling and semantic guidance mechanisms.
📝 Abstract
This technical report explores the MOSEv2 track of the LSVOS Challenge, which targets complex semi-supervised video object segmentation. By analysing and adapting SeC, an enhanced SAM-2 framework, we conduct a detailed study of its long-term memory and concept-aware memory, showing that long-term memory preserves temporal continuity under occlusion and reappearance, while concept-aware memory supplies semantic priors that suppress distractors; together, these traits directly benefit several MOSEv2's core challenges. Our solution achieves a JF score of 39.89% on the test set, ranking 1st in the MOSEv2 track of the LSVOS Challenge.