🤖 AI Summary
This work addresses the challenge of membership inference attacks (MIAs) against multimodal large language models (MLLMs), where conventional MIAs fail due to cross-modal adaptation and input distribution shift. To tackle this, we propose FiMMIA—a novel framework that pioneers perturbation-based membership inference in multimodal settings. FiMMIA introduces semantic-level cross-modal input perturbations and jointly models the target model’s differential responses to such perturbations. It further incorporates distribution consistency analysis to mitigate inter-modal distribution shifts. Its modular architecture supports diverse multimodal inputs (e.g., image–text pairs) and ensures strong extensibility. Extensive experiments across multiple fine-tuned MLLMs demonstrate that FiMMIA significantly outperforms existing baselines in identifying training set membership, achieving robust and practical performance. To our knowledge, FiMMIA is the first framework enabling reliable, scalable assessment of data leakage risks in MLLMs.
📝 Abstract
Membership Inference Attacks (MIAs) aim to determine whether a specific data point was included in the training set of a target model. Although there are have been numerous methods developed for detecting data contamination in large language models (LLMs), their performance on multimodal LLMs (MLLMs) falls short due to the instabilities introduced through multimodal component adaptation and possible distribution shifts across multiple inputs. In this work, we investigate multimodal membership inference and address two issues: first, by identifying distribution shifts in the existing datasets, and second, by releasing an extended baseline pipeline to detect them. We also generalize the perturbation-based membership inference methods to MLLMs and release extbf{FiMMIA} -- a modular extbf{F}ramework for extbf{M}ultimodal extbf{MIA}.footnote{The source code and framework have been made publicly available under the MIT license via href{https://github.com/ai-forever/data_leakage_detect}{link}.The video demonstration is available on href{https://youtu.be/a9L4-H80aSg}{YouTube}.} Our approach trains a neural network to analyze the target model's behavior on perturbed inputs, capturing distributional differences between members and non-members. Comprehensive evaluations on various fine-tuned multimodal models demonstrate the effectiveness of our perturbation-based membership inference attacks in multimodal domains.