Visual-Noise Guided In-Context Distillation for Multimodal Large Language Model Unlearning

📅 2026-05-26

📈 Citations: 0

✨ Influential: 0

career value

181K/year

🤖 AI Summary

This work addresses the challenge that multimodal large language models are prone to memorizing and leaking sensitive information, while existing unlearning methods struggle to effectively erase such knowledge without degrading general capabilities. The authors propose a parameter-level unlearning framework that requires neither external teacher models nor annotations of harmful samples. Their approach uniquely integrates visual perturbation with contextual distillation, employing a dual-modality intervention strategy that freezes the base model to construct a teacher distribution, which then guides the student model to forget targeted knowledge. Experimental results demonstrate that, under standard settings, the method reduces ROUGE-L on the forget set by 0.371 while only decreasing it by 0.055 on the retain set, significantly outperforming current approaches and achieving a strong balance between safety and utility.

📝 Abstract

Multimodal Large Language Models (MLLMs) have achieved remarkable progress on vision-language tasks, but they may also memorize and expose sensitive or restricted knowledge, raising concerns about privacy and broader safety risks. Machine Unlearning (MU) provides a promising way to remove targeted undesirable knowledge from trained models without retraining from scratch while preserving general model utility. Nevertheless, effective unlearning in MLLMs remains particularly challenging. Existing training-based methods often struggle to balance unlearning effectiveness and model utility. In contrast, training-free methods such as in-context unlearning preserve model utility by avoiding parameter updates, but they do not remove memorized knowledge at the parameter level and may remain vulnerable to reverse-engineering attacks. More importantly, in-context unlearning is insufficient in multimodal settings, where visual inputs can provide strong conditioning signals and induce undesirable outputs. To address these challenges, we propose Visual-Noise Guided In-Context Distillation (VGID), a distillation-based framework for MLLM unlearning. VGID dynamically constructs an unlearning-oriented teacher distribution from the frozen base model through dual-modal intervention that combines visual perturbation with textual in-context unlearning. The resulting intervention-induced distribution serves as a teacher signal for distillation, guiding the student model toward parameter-level unlearning without requiring external teacher models or explicit undesirable response annotations. Experimental results show that VGID achieves strong unlearning effectiveness while preserving competitive model utility, reducing forget set ROUGE-L by 0.371 with only a 0.055 drop in retain set ROUGE-L in a representative setting.

Problem

Research questions and friction points this paper is trying to address.

Multimodal Large Language Models

Machine Unlearning

Visual-Noise

In-Context Learning

Privacy

Innovation

Methods, ideas, or system contributions that make the work stand out.

Machine Unlearning

Multimodal Large Language Models

In-Context Learning