MMRC: A Large-Scale Benchmark for Understanding Multimodal Large Language Model in Real-World Conversation

📅 2025-02-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses critical limitations of multimodal large language models (MLLMs) in realistic, sustained dialogues—including memory decay, delayed knowledge updating, error accumulation in reasoning, and weak refusal capability. To systematically evaluate these issues, we introduce MMRC, the first benchmark specifically designed for open-ended, multi-turn multimodal interaction. MMRC is grounded in real-world scenarios and comprises 5,120 multi-turn dialogues and 28,720 human-annotated questions, enabling comprehensive assessment across six core capabilities: information extraction, multi-turn reasoning, knowledge updating, image management, memory recall, and refusal. Extensive experiments reveal significant performance degradation across 20 state-of-the-art MLLMs in long-horizon interactions and identify four prevalent failure patterns. We further propose NOTE-TAKING, a lightweight memory-augmentation strategy, which yields an average 12.3% improvement in key capabilities across six models. MMRC establishes a new standard for evaluating MLLMs in practical deployment and provides an extensible pathway for optimization.

Technology Category

Application Category

📝 Abstract
Recent multimodal large language models (MLLMs) have demonstrated significant potential in open-ended conversation, generating more accurate and personalized responses. However, their abilities to memorize, recall, and reason in sustained interactions within real-world scenarios remain underexplored. This paper introduces MMRC, a Multi-Modal Real-world Conversation benchmark for evaluating six core open-ended abilities of MLLMs: information extraction, multi-turn reasoning, information update, image management, memory recall, and answer refusal. With data collected from real-world scenarios, MMRC comprises 5,120 conversations and 28,720 corresponding manually labeled questions, posing a significant challenge to existing MLLMs. Evaluations on 20 MLLMs in MMRC indicate an accuracy drop during open-ended interactions. We identify four common failure patterns: long-term memory degradation, inadequacies in updating factual knowledge, accumulated assumption of error propagation, and reluctance to say no. To mitigate these issues, we propose a simple yet effective NOTE-TAKING strategy, which can record key information from the conversation and remind the model during its responses, enhancing conversational capabilities. Experiments across six MLLMs demonstrate significant performance improvements.
Problem

Research questions and friction points this paper is trying to address.

evaluating multimodal large language models
addressing real-world conversation challenges
improving memory and reasoning in MLLMs
Innovation

Methods, ideas, or system contributions that make the work stand out.

NOTE-TAKING strategy
Multi-Modal Real-world Conversation
manual labeled questions
🔎 Similar Papers
No similar papers found.
H
Haochen Xue
Shanghai Artificial Intelligence Laboratory, Xi’an Jiaotong-Liverpool University
F
Feilong Tang
Shanghai Artificial Intelligence Laboratory, Monash University, MBZUAI
M
Ming Hu
Shanghai Artificial Intelligence Laboratory, Monash University
Yexin Liu
Yexin Liu
The Hong Kong University of Science and Technology
AIGC
Qidong Huang
Qidong Huang
Qwen Team, Alibaba Cloud
vision and language
Y
Yulong Li
Shanghai Artificial Intelligence Laboratory
Chengzhi Liu
Chengzhi Liu
PhD, UC Santa Barbara
Vison Language ModelTruthworthy AIReasoning
Z
Zhongxing Xu
Monash University
C
Chong Zhang
Xi’an Jiaotong-Liverpool University
Chun-Mei Feng
Chun-Mei Feng
Assistant Professor/Ad Astra Fellow, University College Dublin, Ireland
AI for HealthCareMulti-modal LearningFederated Learning
Y
Yutong Xie
MBZUAI
Imran Razzak
Imran Razzak
MBZUAI, Abu Dhabi
Human-Centered AIMedical Image AnalysisMedical Artificial IntelligenceComputational Biology
Z
Zongyuan Ge
Monash University
Jionglong Su
Jionglong Su
Xi'an Jiaotong-Liverpool University
AI Big Data Machine Learning Statistics
Junjun He
Junjun He
Shanghai Jiao Tong University
Y
Yu Qiao
Shanghai Artificial Intelligence Laboratory