Memory-SAM: Human-Prompt-Free Tongue Segmentation via Retrieval-to-Prompt

📅 2025-10-17

📈 Citations: 0

✨ Influential: 0

career value

192K/year

🤖 AI Summary

To address the reliance of tongue image segmentation on manual annotations or interactive prompts, this paper proposes a fully automated segmentation paradigm requiring no human intervention. Our method constructs a compact prior case library and leverages DINOv3 to extract dense self-supervised features; FAISS-based approximate nearest-neighbor retrieval then automatically generates high-quality prompt points. Furthermore, we introduce a mask-constrained correspondence point distillation mechanism to guide SAM2 toward precise segmentation. To our knowledge, this is the first approach achieving end-to-end automatic tongue image segmentation without model fine-tuning, manual prompting, or ground-truth annotations. Evaluated on a mixed test set, it achieves an mIoU of 0.9863—substantially outperforming FCN and bounding-box-based baselines. The method demonstrates superior robustness in complex tongue boundary delineation and real-world scenarios, while exhibiting high data efficiency.

Technology Category

Application Category

📝 Abstract

Accurate tongue segmentation is crucial for reliable TCM analysis. Supervised models require large annotated datasets, while SAM-family models remain prompt-driven. We present Memory-SAM, a training-free, human-prompt-free pipeline that automatically generates effective prompts from a small memory of prior cases via dense DINOv3 features and FAISS retrieval. Given a query image, mask-constrained correspondences to the retrieved exemplar are distilled into foreground/background point prompts that guide SAM2 without manual clicks or model fine-tuning. We evaluate on 600 expert-annotated images (300 controlled, 300 in-the-wild). On the mixed test split, Memory-SAM achieves mIoU 0.9863, surpassing FCN (0.8188) and a detector-to-box SAM baseline (0.1839). On controlled data, ceiling effects above 0.98 make small differences less meaningful given annotation variability, while our method shows clear gains under real-world conditions. Results indicate that retrieval-to-prompt enables data-efficient, robust segmentation of irregular boundaries in tongue imaging. The code is publicly available at https://github.com/jw-chae/memory-sam.

Problem

Research questions and friction points this paper is trying to address.

Automates tongue segmentation without manual prompts or training

Generates prompts via retrieval from prior cases using DINOv3 features

Enables robust segmentation of irregular tongue boundaries efficiently

Innovation

Methods, ideas, or system contributions that make the work stand out.

Automatically generates prompts from memory cases

Uses DINOv3 features and FAISS retrieval system

Achieves training-free segmentation via mask-constrained correspondences

🔎 Similar Papers

No similar papers found.