Calibrated and Robust Foundation Models for Vision-Language and Medical Image Tasks Under Distribution Shift

📅 2025-07-12

📈 Citations: 0

✨ Influential: 0

career value

184K/year

🤖 AI Summary

To address the dual challenges of performance degradation and confidence miscalibration under distribution shift in foundation models (e.g., CLIP, SAM), this paper proposes StaRFM—a unified robust framework. Methodologically, StaRFM is the first to jointly integrate Fisher Information Penalty (FIP) into regularization and voxel-/patch-level confidence calibration loss (CMP), supporting both 2D/3D vision and medical imaging tasks. Theoretically, it derives a PAC-Bayes generalization bound and optimizes the Brier score; practically, it enables plug-and-play deployment. Evaluated on 19 diverse vision benchmarks, StaRFM achieves an average accuracy gain of 3.5% and reduces Expected Calibration Error (ECE) by 28%. In medical image segmentation, it attains 84.7% Dice score and 4.8 mm HD95, narrowing cross-domain performance gaps by 40%. These results demonstrate substantial improvements in model generalization and uncertainty calibration.

Technology Category

Application Category

📝 Abstract

Foundation models like CLIP and SAM have transformed computer vision and medical imaging via low-shot transfer learning. However, deployment of these models hindered by two key challenges: extit{distribution shift} between training and test data, and extit{confidence misalignment} that leads to overconfident incorrect predictions. These issues manifest differently in vision-language classification and medical segmentation tasks, yet existing solutions remain domain-specific. We propose extit{StaRFM}, a unified framework addressing both challenges. It introduces a Fisher information penalty (FIP), extended to 3D medical data via patch-wise regularization, to reduce covariate shift in CLIP and SAM embeddings. Additionally, a confidence misalignment penalty (CMP), reformulated for voxel-level predictions, calibrates uncertainty in segmentation tasks. We theoretically derive PAC-Bayes bounds showing FIP controls generalization via the Fisher-Rao norm, while CMP minimizes calibration error through Brier score optimization. StaRFM shows consistent performance like exttt{+}3.5% accuracy and 28% lower ECE on 19 vision datasets (e.g., ImageNet, Office-Home), 84.7% DSC and 4.8mm HD95 in medical segmentation (e.g., BraTS, ATLAS), and 40% lower cross-domain performance gap compared to prior benchmarking methods. The framework is plug-and-play, requiring minimal architectural changes for seamless integration with foundation models. Code and models will be released at https://anonymous.4open.science/r/StaRFM-C0CD/README.md

Problem

Research questions and friction points this paper is trying to address.

Addresses distribution shift in vision-language and medical tasks

Reduces overconfident incorrect predictions via calibration

Unifies solutions for both classification and segmentation challenges

Innovation

Methods, ideas, or system contributions that make the work stand out.

Fisher information penalty reduces covariate shift

Confidence misalignment penalty calibrates uncertainty

Plug-and-play framework integrates foundation models

🔎 Similar Papers

No similar papers found.