Watch Out Your Album! On the Inadvertent Privacy Memorization in Multi-Modal Large Language Models

📅 2025-03-03

📈 Citations: 0

✨ Influential: 0

career value

177K/year

🤖 AI Summary

This work reveals that multimodal large language models (MLLMs) inadvertently memorize task-irrelevant private watermark content embedded in images during visual question answering (VQA) fine-tuning—a phenomenon driven by spurious correlations induced by small-batch training dynamics. To address this, we propose the first probe framework targeting task-agnostic private content, integrating randomized watermark injection, layer-wise feature separability analysis, mini-batch dynamical modeling, and representation visualization. Our framework systematically demonstrates that MLLMs stably encode and disentangle private information in latent representations—even when outputs exhibit no observable bias. Experiments across standard fine-tuning settings confirm the phenomenon’s prevalence and reproducibility; reveal strong detectability of private content in intermediate feature layers; and show that batch composition significantly modulates memory strength. The code is publicly released.

Technology Category

Application Category

📝 Abstract

Multi-Modal Large Language Models (MLLMs) have exhibited remarkable performance on various vision-language tasks such as Visual Question Answering (VQA). Despite accumulating evidence of privacy concerns associated with task-relevant content, it remains unclear whether MLLMs inadvertently memorize private content that is entirely irrelevant to the training tasks. In this paper, we investigate how randomly generated task-irrelevant private content can become spuriously correlated with downstream objectives due to partial mini-batch training dynamics, thus causing inadvertent memorization. Concretely, we randomly generate task-irrelevant watermarks into VQA fine-tuning images at varying probabilities and propose a novel probing framework to determine whether MLLMs have inadvertently encoded such content. Our experiments reveal that MLLMs exhibit notably different training behaviors in partial mini-batch settings with task-irrelevant watermarks embedded. Furthermore, through layer-wise probing, we demonstrate that MLLMs trigger distinct representational patterns when encountering previously seen task-irrelevant knowledge, even if this knowledge does not influence their output during prompting. Our code is available at https://github.com/illusionhi/ProbingPrivacy.

Problem

Research questions and friction points this paper is trying to address.

Investigates privacy risks in Multi-Modal Large Language Models (MLLMs).

Examines if MLLMs memorize task-irrelevant private content.

Proposes a framework to detect inadvertent memorization in MLLMs.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Randomly generated task-irrelevant watermarks in VQA images

Novel probing framework for MLLM privacy memorization

Layer-wise probing to detect distinct representational patterns

🔎 Similar Papers

Preserving Privacy in Large Language Models: A Survey on Current Threats and Solutions