When Recovery Matters: The Blind Spot of Surrogate Privacy in MLLM Editing

📅 2026-06-05

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Existing privacy-preserving image editing methods overlook the localized recovery of original private images from their surrogate counterparts, often yielding edits that deviate from user intent. To address this gap, this work introduces SPPE, the first recovery-oriented benchmark for privacy-preserving editing, comprising two complementary tasks: editability assessment of surrogate images and edit recovery from surrogate to source images. We propose ERMA, a model that performs instruction-aware multimodal relational modeling to predict editability, and C2E-S2SER, a method leveraging cycle consistency to achieve faithful edit recovery. Experimental results demonstrate that ERMA improves SRCC and PLCC by 13.9% and 12.3%, respectively, while C2E-S2SER consistently outperforms baseline approaches across all eight integrity and consistency metrics.

📝 Abstract

Multimodal Large Language Models (MLLMs) enable flexible instruction-driven image editing, but privacy risks arise when user images expose diverse and user-specific private content. Canonical privacy protection strategies typically substitute sensitive regions with surrogate content before cloud editing. Yet, the resulting output is often an edited surrogate rather than the desired edited source image, neglecting the local recovery in both design and evaluation scope. To this end, we introduce SPPE (Surrogate-based Privacy-Preserving Editing), the first recovery-oriented benchmark covering 36 fine-grained privacy categories and 65 editing instructions. It defines two complementary tasks: 1) editability assessment, which estimates before cloud interaction whether a surrogate can induce an edit consistent with the original image; and 2) surrogate-to-source edit recovery, which evaluates whether the edited surrogate can be transferred back to the private source with the edit effect preserved. We address each task with a dedicated method: ERMA predicts surrogate editability through instruction-aware multimodal relation modeling, while \method performs cycle-consistent recovery by using the surrogate editing pair as visual edit evidence and the source image as a source-preserving anchor. Experiments on SPPE and InstructPix2Pix show consistent improvements on both tasks. For editability assessment, ERMA improves over the best-performing baselines by 13.9% in SRCC and 12.3% in PLCC. For surrogate-to-source edit recovery, C2E-S2SER outperforms SOER across all 8 source integrity and edit consistency metrics on SPPE.

Problem

Research questions and friction points this paper is trying to address.

privacy-preserving editing

surrogate recovery

multimodal large language models

image editing

private content

Innovation

Methods, ideas, or system contributions that make the work stand out.

recovery-oriented editing

surrogate privacy

multimodal large language models