🤖 AI Summary
Data contamination—particularly undetectable cross-modal leakage between text and images—plagues multimodal large language model (MLLM) training, inflating benchmark performance and undermining evaluation reliability. To address this, we propose MM-Detect, the first contamination detection framework tailored for MLLMs. It uniquely distinguishes contamination sources across two stages: LLM pretraining and MLLM fine-tuning. Our method introduces a cross-modal sensitivity detection mechanism, leveraging multimodal embedding alignment, cross-modal similarity distillation, and stage-wise attribution analysis to quantify contamination’s impact on performance. Using a controlled contamination injection experimental paradigm, MM-Detect successfully identifies significant training-set leakage in multiple state-of-the-art MLLMs on benchmarks including MMBench and OCRBench; up to 37% of certain models’ performance gains are attributable to contamination. MM-Detect thus provides a critical tool and empirical foundation for fair, rigorous MLLM evaluation.
📝 Abstract
The rapid progression of multimodal large language models (MLLMs) has demonstrated superior performance on various multimodal benchmarks. However, the issue of data contamination during training creates challenges in performance evaluation and comparison. While numerous methods exist for detecting dataset contamination in large language models (LLMs), they are less effective for MLLMs due to their various modalities and multiple training phases. In this study, we introduce a multimodal data contamination detection framework, MM-Detect, designed for MLLMs. Our experimental results indicate that MM-Detect is sensitive to varying degrees of contamination and can highlight significant performance improvements due to leakage of the training set of multimodal benchmarks. Furthermore, We also explore the possibility of contamination originating from the pre-training phase of LLMs used by MLLMs and the fine-tuning phase of MLLMs, offering new insights into the stages at which contamination may be introduced.