๐ค AI Summary
To address the reliance of tongue image segmentation on manual annotations or interactive prompts, this paper proposes a fully automated segmentation paradigm requiring no human intervention. Our method constructs a compact prior case library and leverages DINOv3 to extract dense self-supervised features; FAISS-based approximate nearest-neighbor retrieval then automatically generates high-quality prompt points. Furthermore, we introduce a mask-constrained correspondence point distillation mechanism to guide SAM2 toward precise segmentation. To our knowledge, this is the first approach achieving end-to-end automatic tongue image segmentation without model fine-tuning, manual prompting, or ground-truth annotations. Evaluated on a mixed test set, it achieves an mIoU of 0.9863โsubstantially outperforming FCN and bounding-box-based baselines. The method demonstrates superior robustness in complex tongue boundary delineation and real-world scenarios, while exhibiting high data efficiency.
๐ Abstract
Accurate tongue segmentation is crucial for reliable TCM analysis. Supervised models require large annotated datasets, while SAM-family models remain prompt-driven. We present Memory-SAM, a training-free, human-prompt-free pipeline that automatically generates effective prompts from a small memory of prior cases via dense DINOv3 features and FAISS retrieval. Given a query image, mask-constrained correspondences to the retrieved exemplar are distilled into foreground/background point prompts that guide SAM2 without manual clicks or model fine-tuning. We evaluate on 600 expert-annotated images (300 controlled, 300 in-the-wild). On the mixed test split, Memory-SAM achieves mIoU 0.9863, surpassing FCN (0.8188) and a detector-to-box SAM baseline (0.1839). On controlled data, ceiling effects above 0.98 make small differences less meaningful given annotation variability, while our method shows clear gains under real-world conditions. Results indicate that retrieval-to-prompt enables data-efficient, robust segmentation of irregular boundaries in tongue imaging. The code is publicly available at https://github.com/jw-chae/memory-sam.