Multimodal Prototype Alignment for Semi-supervised Pathology Image Segmentation

📅 2025-08-27

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Pathological image segmentation faces dual challenges: ambiguous semantic boundaries and high cost of pixel-level annotations. To address these, we propose MPAMatch—a novel framework that introduces text prototype supervision for the first time, establishing an image-text dual-prototype contrastive learning mechanism to jointly guide coarse structural and fine-grained semantic modeling. MPAMatch integrates pixel-wise contrastive learning, multimodal prototype alignment, and consistency regularization, while replacing the ViT backbone in TransUNet with the pathology-pretrained Uni model to enhance feature representation. Extensive experiments on benchmark datasets—including GLAS and EBHI-SEG-GLAND—demonstrate significant improvements over state-of-the-art methods, validating the synergistic gains in structural modeling and semantic understanding. The framework achieves superior segmentation accuracy, robustness to boundary ambiguity, and efficient utilization of limited annotated data.

Technology Category

Application Category

📝 Abstract

Pathological image segmentation faces numerous challenges, particularly due to ambiguous semantic boundaries and the high cost of pixel-level annotations. Although recent semi-supervised methods based on consistency regularization (e.g., UniMatch) have made notable progress, they mainly rely on perturbation-based consistency within the image modality, making it difficult to capture high-level semantic priors, especially in structurally complex pathology images. To address these limitations, we propose MPAMatch - a novel segmentation framework that performs pixel-level contrastive learning under a multimodal prototype-guided supervision paradigm. The core innovation of MPAMatch lies in the dual contrastive learning scheme between image prototypes and pixel labels, and between text prototypes and pixel labels, providing supervision at both structural and semantic levels. This coarse-to-fine supervisory strategy not only enhances the discriminative capability on unlabeled samples but also introduces the text prototype supervision into segmentation for the first time, significantly improving semantic boundary modeling. In addition, we reconstruct the classic segmentation architecture (TransUNet) by replacing its ViT backbone with a pathology-pretrained foundation model (Uni), enabling more effective extraction of pathology-relevant features. Extensive experiments on GLAS, EBHI-SEG-GLAND, EBHI-SEG-CANCER, and KPI show MPAMatch's superiority over state-of-the-art methods, validating its dual advantages in structural and semantic modeling.

Problem

Research questions and friction points this paper is trying to address.

Addresses ambiguous semantic boundaries in pathology image segmentation

Reduces reliance on costly pixel-level annotations through semi-supervised learning

Improves semantic boundary modeling with multimodal prototype alignment

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multimodal prototype-guided supervision for segmentation

Dual contrastive learning between image and text prototypes

Pathology-pretrained foundation model replacing ViT backbone

🔎 Similar Papers

No similar papers found.

Authors to Follow