🤖 AI Summary
In RAG/MRAG systems, retrieval-augmented generation and multimodal fusion obscure content provenance, rendering existing membership inference methods incapable of distinguishing whether generated content originates from pretraining data, external retrieval, or user input—severely undermining the traceability of privacy leakage.
Method: We propose SMA, the first source-aware membership auditing framework, which shifts membership inference from “whether memorized” to “where sourced” under a semi-black-box setting. SMA enables fine-grained, cross-modal provenance attribution—including text-level attribution of image retrieval traces—via zeroth-order optimization-driven perturbation sampling and ridge regression modeling, leveraging semantic alignment between text and images in multimodal large language models (MLLMs).
Contribution/Results: SMA achieves high-accuracy leakage auditing for both textual and visual retrieval sources. Experiments demonstrate substantial improvements in data provenance reliability and privacy accountability for complex generative systems.
📝 Abstract
Retrieval-Augmented Generation (RAG) and its Multimodal Retrieval-Augmented Generation (MRAG) significantly improve the knowledge coverage and contextual understanding of Large Language Models (LLMs) by introducing external knowledge sources. However, retrieval and multimodal fusion obscure content provenance, rendering existing membership inference methods unable to reliably attribute generated outputs to pre-training, external retrieval, or user input, thus undermining privacy leakage accountability
To address these challenges, we propose the first Source-aware Membership Audit (SMA) that enables fine-grained source attribution of generated content in a semi-black-box setting with retrieval control capabilities.To address the environmental constraints of semi-black-box auditing, we further design an attribution estimation mechanism based on zero-order optimization, which robustly approximates the true influence of input tokens on the output through large-scale perturbation sampling and ridge regression modeling. In addition, SMA introduces a cross-modal attribution technique that projects image inputs into textual descriptions via MLLMs, enabling token-level attribution in the text modality, which for the first time facilitates membership inference on image retrieval traces in MRAG systems. This work shifts the focus of membership inference from 'whether the data has been memorized' to 'where the content is sourced from', offering a novel perspective for auditing data provenance in complex generative systems.