🤖 AI Summary
This work investigates the representational capacity of intermediate-layer features from medical vision foundation models—specifically segmentation (e.g., nnUNet) and registration (e.g., VoxelNet) models—for disease progression prediction. It examines how these features differ in capturing structural information versus temporal dynamics, and their dependence on spatial alignment of input images. Using linear probing for downstream evaluation and complementing it with feature interpretability analysis, the study conducts systematic validation across multiple longitudinal medical imaging datasets. Key findings reveal that registration-model features enable effective progression prediction without requiring explicit spatial alignment, achieving up to an 8.2% AUC improvement; in contrast, segmentation-model features exhibit strong performance degradation without proper image alignment, highlighting the critical role of alignment strategies for downstream tasks. This is the first study to systematically characterize such representational disparities between foundational medical vision models, offering novel insights and practical guidance for time-series modeling in medical imaging.
📝 Abstract
Medical vision foundational models are used for a wide variety of tasks, including medical image segmentation and registration. This work evaluates the ability of these models to predict disease progression using a simple linear probe. We hypothesize that intermediate layer features of segmentation models capture structural information, while those of registration models encode knowledge of change over time. Beyond demonstrating that these features are useful for disease progression prediction, we also show that registration model features do not require spatially aligned input images. However, for segmentation models, spatial alignment is essential for optimal performance. Our findings highlight the importance of spatial alignment and the utility of foundation model features for image registration.