Leveraging Vision-Language Foundation Models to Reveal Hidden Image-Attribute Relationships in Medical Imaging

📅 2025-03-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the challenge of uncovering latent, unlabeled clinical image–attribute causal relationships in chest X-ray (CXR) images using vision–language models (VLMs). Conventional structural causal models (SCMs) suffer from low spatial resolution, poor editing fidelity, and coarse-grained metadata, limiting their ability to identify critical data characteristics. To overcome these limitations, we propose— for the first time—the fine-tuning of CLIP- and Flamingo-style VLMs for attribute inversion, integrated with a causal inference–driven disentanglement strategy and a bias diagnostic framework. Experiments demonstrate that our method achieves state-of-the-art performance over existing SCMs in attribute-controllable generation fidelity, implicit association discovery, and spurious correlation identification. It successfully reveals multiple clinically meaningful yet unlabeled image–attribute combinations. Moreover, it quantifies VLMs’ sensitivity to biases and generalization constraints in fine-grained image editing.

Technology Category

Application Category

📝 Abstract
Vision-language foundation models (VLMs) have shown impressive performance in guiding image generation through text, with emerging applications in medical imaging. In this work, we are the first to investigate the question: 'Can fine-tuned foundation models help identify critical, and possibly unknown, data properties?' By evaluating our proposed method on a chest x-ray dataset, we show that these models can generate high-resolution, precisely edited images compared to methods that rely on Structural Causal Models (SCMs) according to numerous metrics. For the first time, we demonstrate that fine-tuned VLMs can reveal hidden data relationships that were previously obscured due to available metadata granularity and model capacity limitations. Our experiments demonstrate both the potential of these models to reveal underlying dataset properties while also exposing the limitations of fine-tuned VLMs for accurate image editing and susceptibility to biases and spurious correlations.
Problem

Research questions and friction points this paper is trying to address.

Identify hidden image-attribute relationships in medical imaging
Evaluate VLMs for high-resolution image editing in chest x-rays
Reveal obscured data properties due to metadata and model limits
Innovation

Methods, ideas, or system contributions that make the work stand out.

Fine-tuned VLMs reveal hidden image-attribute relationships
VLMs outperform SCMs in high-resolution image editing
VLMs expose metadata granularity and model limitations
🔎 Similar Papers
No similar papers found.
A
Amar Kumar
McGill University MILA-Quebec AI Institute
Anita Kriz
Anita Kriz
McGill University
B
Barak Pertzov
McMaster University
Tal Arbel
Tal Arbel
Professor of Electrical & Computer Engineering, McGill University
Computer VisionMedical Imaging