🤖 AI Summary
This work addresses the challenge of privacy leakage in multimodal large language models (MLLMs), where user inputs and visual contexts may contain sensitive information, and existing approaches struggle to balance privacy preservation with semantic fidelity. The authors propose Anchor-based Privacy Drift (APD), a training-free method that leverages contextual anchoring to replace sensitive elements with semantically equivalent substitutes, effectively removing private content while retaining critical contextual cues. APD enables, for the first time, adaptive privacy control in open-world settings, overcoming limitations imposed by predefined categories and fixed sanitization policies. To evaluate performance, the authors introduce AdaptShield, a comprehensive benchmark covering 22 privacy categories that integrates conventional privacy metrics with MLLM-driven contextual utility assessment. Experiments on Qwen2.5, Qwen3, InternVL3, and InternVL3.5 demonstrate average improvements of 10.4% in textual privacy protection and 8.5% in context retention.
📝 Abstract
Multimodal large language models (MLLMs) have raised new privacy challenges. On the data side, user-provided inputs often include unpredictable sensitive information; while on the downstream task side, model reasoning depends on rich visual context that may itself be privacy-sensitive. Existing privacy protection methods, however, rely on predefined sensitive categories and fixed obfuscation strategies, struggling to tackle such challenges in MLLMs. To address this dilemma, we propose Anchored Privacy Drifting (APD), a training-free method that drifts privacy-sensitive elements toward semantically equivalent alternatives while anchoring contextual cues to the source image. To systematically evaluate this dual objective of privacy protection and contextual preservation, we introduce AdaptShield, a comprehensive benchmark covering 22 privacy categories, which combines conventional privacy metrics with MLLM-based assessments of contextual utility. Extensive experiments show that our method achieves balanced improvements in both privacy sanitization and content retention, with average gains of 10.4% on textual categories and 8.5% under MLLM-based evaluation across four MLLM series, i.e., Qwen2.5, Qwen3, InternVL3, and InternVL3.5.