🤖 AI Summary
This paper addresses the challenging task of video camouflaged object segmentation (VCOS), presenting the first systematic evaluation and enhancement of Segment Anything Model 2 (SAM2) for dynamic camouflage scenarios. Methodologically, it introduces three key innovations: (1) a comprehensive zero-shot generalization assessment of SAM2 on VCOS benchmarks; (2) a collaborative inference framework integrating SAM2 with multimodal large language models (MLLMs) and specialized VCOS methods; and (3) an end-to-end supervised fine-tuning strategy tailored to camouflaged video data. Experimental results demonstrate that SAM2 exhibits strong inherent zero-shot VCOS capability; after fine-tuning, it achieves significant mIoU improvements on benchmarks including CAMO-V and CODA, attaining state-of-the-art performance. All code, models, and evaluation protocols are publicly released.
📝 Abstract
This study investigates the application and performance of the Segment Anything Model 2 (SAM2) in the challenging task of video camouflaged object segmentation (VCOS). VCOS involves detecting objects that blend seamlessly in the surroundings for videos, due to similar colors and textures, poor light conditions, etc. Compared to the objects in normal scenes, camouflaged objects are much more difficult to detect. SAM2, a video foundation model, has shown potential in various tasks. But its effectiveness in dynamic camouflaged scenarios remains under-explored. This study presents a comprehensive study on SAM2's ability in VCOS. First, we assess SAM2's performance on camouflaged video datasets using different models and prompts (click, box, and mask). Second, we explore the integration of SAM2 with existing multimodal large language models (MLLMs) and VCOS methods. Third, we specifically adapt SAM2 by fine-tuning it on the video camouflaged dataset. Our comprehensive experiments demonstrate that SAM2 has excellent zero-shot ability of detecting camouflaged objects in videos. We also show that this ability could be further improved by specifically adjusting SAM2's parameters for VCOS. The code is available at https://github.com/zhoustan/SAM2-VCOS