Scaling In-Context Segmentation with Hierarchical Supervision

📅 2026-04-14

📈 Citations: 0

✨ Influential: 0

career value

197K/year

🤖 AI Summary

This work addresses the high computational cost and limited scalability of global attention mechanisms in medical image contextual segmentation, particularly at high resolutions. To overcome these challenges, the authors propose PatchICL, a framework that leverages an informativeness-driven dynamic region selection strategy to combine local attention with multi-level explicit supervision. This approach directs the model to focus on the most discriminative anatomical regions while substantially reducing redundant computation in non-informative areas. Evaluated at 512×512 resolution, PatchICL achieves comparable CT segmentation accuracy to UniverSeg with 44% lower computational overhead and demonstrates superior cross-modal generalization across 35 out-of-domain datasets, including OCT and dermoscopy—modalities predominantly characterized by localized pathologies.

Technology Category

Application Category

📝 Abstract

In-context learning (ICL) enables medical image segmentation models to adapt to new anatomical structures from limited examples, reducing the clinical annotation burden. However, standard ICL methods typically rely on dense, global cross-attention, which scales poorly with image resolution. While recent approaches have introduced localized attention mechanisms, they often lack explicit supervision on the selection process, leading to redundant computation in non-informative regions. We propose PatchICL, a hierarchical framework that combines selective image patching with multi-level supervision. Our approach learns to actively identify and attend only to the most informative anatomical regions. Compared to UniverSeg, a strong global-attention baseline, PatchICL achieves competitive in-domain CT segmentation accuracy while reducing compute by 44\% at $512\times512$ resolution. On 35 out-of-domain datasets spanning diverse imaging modalities, PatchICL outperforms the baseline on 6 of 13 modality categories, with particular strength on modalities dominated by localized pathology such as OCT and dermoscopy. Training and evaluation code are available at https://github.com/tidiane-camaret/ic_segmentation

Problem

Research questions and friction points this paper is trying to address.

in-context learning

medical image segmentation

attention mechanism

computational efficiency

hierarchical supervision

Innovation

Methods, ideas, or system contributions that make the work stand out.

in-context learning

hierarchical supervision

selective attention