π€ AI Summary
Medical vision-language models (VLMs) lack systematic safety evaluation, particularly regarding robustness against text-based prompt attacks and visual perturbations. To address this gap, we propose VSF-Medβthe first end-to-end vulnerability scoring framework tailored for medical VLMs. It integrates a curated prompt-attack library with an SSIM-constrained visual perturbation generator, and employs a dual-judge large language model assessment mechanism coupled with z-score normalization to produce a standardized 0β32 risk score. Evaluated on public datasets, VSF-Med generates over 30,000 adversarial variants, enabling single-command reproducible testing. Experimental results reveal pervasive security vulnerabilities across state-of-the-art medical VLMs. This work establishes a quantifiable, reproducible, and domain-specific paradigm for safety assessment of medical AI systems.
π Abstract
Vision Language Models (VLMs) hold great promise for streamlining labour-intensive medical imaging workflows, yet systematic security evaluations in clinical settings remain scarce. We introduce VSF--Med, an end-to-end vulnerability-scoring framework for medical VLMs that unites three novel components: (i) a rich library of sophisticated text-prompt attack templates targeting emerging threat vectors; (ii) imperceptible visual perturbations calibrated by structural similarity (SSIM) thresholds to preserve clinical realism; and (iii) an eight-dimensional rubric evaluated by two independent judge LLMs, whose raw scores are consolidated via z-score normalization to yield a 0--32 composite risk metric. Built entirely on publicly available datasets and accompanied by open-source code, VSF--Med synthesizes over 30,000 adversarial variants from 5,000 radiology images and enables reproducible benchmarking of any medical VLM with a single command. Our consolidated analysis reports mean z-score shifts of $0.90Ο$ for persistence-of-attack-effects, $0.74Ο$ for prompt-injection effectiveness, and $0.63Ο$ for safety-bypass success across state-of-the-art VLMs. Notably, Llama-3.2-11B-Vision-Instruct exhibits a peak vulnerability increase of $1.29Ο$ for persistence-of-attack-effects, while GPT-4o shows increases of $0.69Ο$ for that same vector and $0.28Ο$ for prompt-injection attacks.