VSF-Med:A Vulnerability Scoring Framework for Medical Vision-Language Models

📅 2025-06-24

📈 Citations: 0

✨ Influential: 0

career value

232K/year

🤖 AI Summary

Medical vision-language models (VLMs) lack systematic safety evaluation, particularly regarding robustness against text-based prompt attacks and visual perturbations. To address this gap, we propose VSF-Med—the first end-to-end vulnerability scoring framework tailored for medical VLMs. It integrates a curated prompt-attack library with an SSIM-constrained visual perturbation generator, and employs a dual-judge large language model assessment mechanism coupled with z-score normalization to produce a standardized 0–32 risk score. Evaluated on public datasets, VSF-Med generates over 30,000 adversarial variants, enabling single-command reproducible testing. Experimental results reveal pervasive security vulnerabilities across state-of-the-art medical VLMs. This work establishes a quantifiable, reproducible, and domain-specific paradigm for safety assessment of medical AI systems.

Technology Category

Application Category

📝 Abstract

Vision Language Models (VLMs) hold great promise for streamlining labour-intensive medical imaging workflows, yet systematic security evaluations in clinical settings remain scarce. We introduce VSF--Med, an end-to-end vulnerability-scoring framework for medical VLMs that unites three novel components: (i) a rich library of sophisticated text-prompt attack templates targeting emerging threat vectors; (ii) imperceptible visual perturbations calibrated by structural similarity (SSIM) thresholds to preserve clinical realism; and (iii) an eight-dimensional rubric evaluated by two independent judge LLMs, whose raw scores are consolidated via z-score normalization to yield a 0--32 composite risk metric. Built entirely on publicly available datasets and accompanied by open-source code, VSF--Med synthesizes over 30,000 adversarial variants from 5,000 radiology images and enables reproducible benchmarking of any medical VLM with a single command. Our consolidated analysis reports mean z-score shifts of $0.90σ$ for persistence-of-attack-effects, $0.74σ$ for prompt-injection effectiveness, and $0.63σ$ for safety-bypass success across state-of-the-art VLMs. Notably, Llama-3.2-11B-Vision-Instruct exhibits a peak vulnerability increase of $1.29σ$ for persistence-of-attack-effects, while GPT-4o shows increases of $0.69σ$ for that same vector and $0.28σ$ for prompt-injection attacks.

Problem

Research questions and friction points this paper is trying to address.

Assessing security vulnerabilities in medical Vision-Language Models (VLMs)

Developing a framework to evaluate adversarial attack effectiveness on VLMs

Measuring risk metrics for clinical safety and attack persistence in VLMs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Text-prompt attack templates for threat vectors

SSIM-calibrated visual perturbations for realism

Eight-dimensional rubric with z-score normalization

🔎 Similar Papers

No similar papers found.