🤖 AI Summary
This work proposes OmniRad, a universal radiology foundation model based on self-supervised learning to address the limited generalization capability in multimodal, multitask medical image analysis. Pretrained on 1.2 million medical images, OmniRad incorporates radiology-inspired pretraining principles and enables efficient transfer to diverse downstream tasks—such as classification and segmentation—via lightweight adapters or end-to-end fine-tuning. Experimental results demonstrate that with a frozen backbone, OmniRad significantly enhances dense prediction performance: it achieves up to a 2.05% absolute improvement in F1 score on MedMNISTv2 for classification and boosts average Dice scores across six datasets in MedSegBench for segmentation. Moreover, its feature space exhibits superior modality separation and clustering structure, indicating enhanced representation quality.
📝 Abstract
Radiological analysis increasingly benefits from pretrained visual representations that can support heterogeneous downstream tasks across imaging modalities. In this work, we introduce OmniRad, a self-supervised radiological foundation model pretrained on 1.2 million medical images, designed with radiology-inspired principles emphasizing representation reuse and cross-task transferability. We evaluate the pretrained encoder under multiple downstream adaptation regimes, including lightweight task-specific adapters with a frozen backbone as well as full end-to-end fine-tuning for classification, allowing us to assess both representation quality and task-specific performance. OmniRad is evaluated on a broad suite of public benchmarks spanning classification and segmentation across multiple modalities. On the MedMNISTv2 collection, OmniRad improves classification F1 by up to 2.05% over competing foundation models. For dense prediction, OmniRad attains mean Dice score improvements across six MedSegBench datasets when using frozen representations. Qualitative analyses and latent-space visualizations suggest improved feature clustering and modality-related separation.