🤖 AI Summary
Medical image classification often suffers from weak discriminability of critical anatomical regions due to domain-specific variations and limited labeled data. Method: This work investigates the transferability of segmentation foundation models—specifically the Segment Anything Model (SAM)—to medical image classification. To address anatomical ambiguity without fine-tuning SAM’s large-scale parameters, we freeze its image encoder as a generic feature extractor and introduce a Spatially Localized Channel Attention (SLCA) mechanism that adaptively recalibrates channel-wise feature weights in a spatially aware manner. Contribution/Results: To our knowledge, this is the first successful adaptation of SAM to medical image classification without parameter tuning, achieving an optimal balance between generalizability and computational efficiency. Extensive experiments on three public medical image classification benchmarks demonstrate significant improvements in accuracy and exceptional data efficiency in few-shot settings, validating that segmentation priors effectively enhance classification performance.
📝 Abstract
Recent advancements in foundation models, such as the Segment Anything Model (SAM), have shown strong performance in various vision tasks, particularly image segmentation, due to their impressive zero-shot segmentation capabilities. However, effectively adapting such models for medical image classification is still a less explored topic. In this paper, we introduce a new framework to adapt SAM for medical image classification. First, we utilize the SAM image encoder as a feature extractor to capture segmentation-based features that convey important spatial and contextual details of the image, while freezing its weights to avoid unnecessary overhead during training. Next, we propose a novel Spatially Localized Channel Attention (SLCA) mechanism to compute spatially localized attention weights for the feature maps. The features extracted from SAM's image encoder are processed through SLCA to compute attention weights, which are then integrated into deep learning classification models to enhance their focus on spatially relevant or meaningful regions of the image, thus improving classification performance. Experimental results on three public medical image classification datasets demonstrate the effectiveness and data-efficiency of our approach.