Safe-LLaVA: A Privacy-Preserving Vision-Language Dataset and Benchmark for Biometric Safety

📅 2025-08-29

📈 Citations: 0

✨ Influential: 0

career value

198K/year

🤖 AI Summary

Multimodal large language models (MLLMs) implicitly leak sensitive biometric attributes—such as race, gender, and age—from input images even in zero-shot settings, yet no systematic evaluation benchmark or privacy-preserving dataset exists to assess or mitigate this risk. Method: We introduce PRISM, the first benchmark to quantitatively measure biometric attribute leakage in MLLMs; construct Safe-LLaVA, the first de-privatized vision-language dataset, which removes sensitive biometric information from LLaVA via semantic-fidelity-constrained explicit cleaning and implicit filtering; and fine-tune MLLMs on Safe-LLaVA. Contribution/Results: Experiments show that mainstream MLLMs exhibit high biometric leakage rates on PRISM, while fine-tuned models achieve substantial leakage reduction—without compromising visual understanding capability or response accuracy—demonstrating the efficacy of our privacy-aware data curation and adaptation framework.

Technology Category

Application Category

📝 Abstract

Multimodal Large Language Models (MLLMs) have demonstrated remarkable capabilities in vision-language tasks. However, these models often infer and reveal sensitive biometric attributes - such as race, gender, age, body weight, and eye color - even when such information is not explicitly requested. This raises critical concerns, particularly in real-world applications and socially-sensitive domains. Despite increasing awareness, no publicly available dataset or benchmark exists to comprehensively evaluate or mitigate biometric leakage in MLLMs. To address this gap, we introduce PRISM (Privacy-aware Evaluation of Responses in Sensitive Modalities), a new benchmark designed to assess MLLMs on two fronts: (1) refuse biometric-related queries and (2) implicit biometric leakage in general responses while maintaining semantic faithfulness. Further, we conduct a detailed audit of the widely used LLaVA datasets and uncover extensive biometric leakage across pretraining and instruction data. To address this, we present Safe-LLaVA dataset, the first privacy-preserving MLLM training dataset constructed by systematically removing explicit and implicit biometric information from LLaVA dataset. Our evaluations on PRISM reveal biometric leakages across MLLMs for different attributes, highlighting the detailed privacy-violations. We also fine-tune a model on Safe-LLaVA dataset and show that it substantially reduces the biometric leakages. Together, Safe-LLaVA & PRISM set a new standard for privacy-aligned development and evaluation of MLLMs. The Safe-LLaVA dataset & PRISM benchmark are publicly available at https://huggingface.co/datasets/kyh9191/Safe-LLaVA, and the source code is available at https://github.com/Kimyounggun99/Safe-LLaVA.git.

Problem

Research questions and friction points this paper is trying to address.

MLLMs reveal sensitive biometric attributes without explicit requests

No existing dataset or benchmark evaluates biometric leakage in MLLMs

Current MLLM training data contains extensive explicit and implicit biometric information

Innovation

Methods, ideas, or system contributions that make the work stand out.

PRISM benchmark evaluates biometric leakage refusal and response faithfulness

Safe-LLaVA dataset removes explicit and implicit biometric information

Fine-tuned model substantially reduces biometric privacy violations

🔎 Similar Papers

Privacy-Aware Visual Language Models