When Large Multimodal Models Confront Evolving Knowledge:Challenges and Pathways

📅 2025-05-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language and multimodal models (LLMs/LMMs) face critical challenges in dynamic knowledge injection, severe catastrophic forgetting, and degraded instruction-following capabilities; existing work lacks systematic investigation of continual learning for evolving multimodal knowledge in LMMs. This paper introduces EVOKE, the first benchmark specifically designed to evaluate multimodal evolving knowledge injection. Our evaluation reveals widespread failures in knowledge injection and a 47% drop in instruction-following performance across current methods. We find that textual augmentation significantly improves accuracy (+19.3%), whereas visual augmentation yields no gains. Furthermore, we demonstrate that replay mechanisms and MoELoRA synergistically mitigate forgetting, with MoELoRA alone reducing forgetting by 62%. This work establishes the first systematic evaluation framework for dynamic knowledge updating in LMMs and identifies effective technical pathways—particularly modular parameter-efficient adaptation combined with replay—to support continual multimodal learning.

Technology Category

Application Category

📝 Abstract
Large language/multimodal models (LLMs/LMMs) store extensive pre-trained knowledge but struggle to maintain consistency with real-world updates, making it difficult to avoid catastrophic forgetting while acquiring evolving knowledge. Previous work focused on constructing textual knowledge datasets and exploring knowledge injection in LLMs, lacking exploration of multimodal evolving knowledge injection in LMMs. To address this, we propose the EVOKE benchmark to evaluate LMMs' ability to inject multimodal evolving knowledge in real-world scenarios. Meanwhile, a comprehensive evaluation of multimodal evolving knowledge injection revealed two challenges: (1) Existing knowledge injection methods perform terribly on evolving knowledge. (2) Supervised fine-tuning causes catastrophic forgetting, particularly instruction following ability is severely compromised. Additionally, we provide pathways and find that: (1) Text knowledge augmentation during the training phase improves performance, while image augmentation cannot achieve it. (2) Continual learning methods, especially Replay and MoELoRA, effectively mitigate forgetting. Our findings indicate that current knowledge injection methods have many limitations on evolving knowledge, which motivates further research on more efficient and stable knowledge injection methods.
Problem

Research questions and friction points this paper is trying to address.

LMMs struggle with real-world knowledge updates and catastrophic forgetting
Lack of multimodal evolving knowledge injection exploration in LMMs
Existing knowledge injection methods fail on evolving knowledge and cause forgetting
Innovation

Methods, ideas, or system contributions that make the work stand out.

EVOKE benchmark for multimodal knowledge evaluation
Text augmentation enhances training phase performance
Replay and MoELoRA mitigate catastrophic forgetting
🔎 Similar Papers
No similar papers found.
K
Kailin Jiang
School of Information Science and Technology, University of Science and Technology of China
Yuntao Du
Yuntao Du
Purdue University
Privacy
Y
Yukai Ding
State Key Laboratory of General Artificial Intelligence, BIGAI
Yuchen Ren
Yuchen Ren
Renmin University of China
N
Ning Jiang
College of Computer and Control Engineering, Northeast Forestry University
Z
Zhi Gao
State Key Laboratory of General Artificial Intelligence, BIGAI
Z
Zilong Zheng
State Key Laboratory of General Artificial Intelligence, BIGAI
L
Lei Liu
School of Information Science and Technology, University of Science and Technology of China
B
Bin Li
School of Information Science and Technology, University of Science and Technology of China
Q
Qing Li
State Key Laboratory of General Artificial Intelligence, BIGAI