MLLMRec: Exploring the Potential of Multimodal Large Language Models in Recommender Systems

πŸ“… 2025-08-21
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Existing multimodal recommendation systems face two key bottlenecks: (1) user multimodal representation initialization is vulnerable to behavioral blind spots or noisy interactions; and (2) item graphs constructed via k-nearest neighbors (k-NN) contain low-similarity noisy edges and neglect audience co-occurrence patterns. To address these, we propose MLLMRecβ€”the first recommendation framework integrating Multimodal Large Language Models (MLLMs). MLLMRec leverages image-to-text generation to model fine-grained behavioral semantics, enabling semantic purification of user preferences. It further reconstructs a high-confidence, co-occurrence-aware item graph via joint threshold-based denoising and topology-aware enhancement. Extensive experiments on three public benchmarks demonstrate that MLLMRec significantly outperforms state-of-the-art baselines, achieving an average performance gain of 38.53%. This validates the effectiveness of MLLM-driven semantic alignment coupled with collaborative graph structural optimization.

Technology Category

Application Category

πŸ“ Abstract
Multimodal recommendation typically combines the user behavioral data with the modal features of items to reveal user's preference, presenting superior performance compared to the conventional recommendations. However, existing methods still suffer from two key problems: (1) the initialization methods of user multimodal representations are either behavior-unperceived or noise-contaminated, and (2) the KNN-based item-item graph contains noisy edges with low similarities and lacks audience co-occurrence relationships. To address such issues, we propose MLLMRec, a novel MLLM-driven multimodal recommendation framework with two item-item graph refinement strategies. On the one hand, the item images are first converted into high-quality semantic descriptions using an MLLM, which are then fused with the textual metadata of items. Then, we construct a behavioral description list for each user and feed it into the MLLM to reason about the purified user preference containing interaction motivations. On the other hand, we design the threshold-controlled denoising and topology-aware enhancement strategies to refine the suboptimal item-item graph, thereby enhancing the item representation learning. Extensive experiments on three publicly available datasets demonstrate that MLLMRec achieves the state-of-the-art performance with an average improvement of 38.53% over the best baselines.
Problem

Research questions and friction points this paper is trying to address.

Refining user multimodal representations to remove noise and incorporate behavior
Enhancing item-item graphs by eliminating low-similarity edges and adding co-occurrence relationships
Improving multimodal recommendation accuracy through MLLM-driven framework and graph refinement
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses MLLM to generate semantic item descriptions
Refines item graphs with denoising and enhancement strategies
Purifies user preferences through behavioral description analysis
πŸ”Ž Similar Papers
2024-08-08International Workshop on Semantic and Social Media Adaptation and PersonalizationCitations: 13
Y
Yuzhuo Dang
National Key Laboratory of Information Systems Engineering, National University of Defense Technology, Changsha, China
X
Xin Zhang
National Key Laboratory of Information Systems Engineering, National University of Defense Technology, Changsha, China
Z
Zhiqiang Pan
National Key Laboratory of Information Systems Engineering, National University of Defense Technology, Changsha, China
Y
Yuxiao Duan
Laboratory for Big Data and Decision, National University of Defense Technology, Changsha, China
Wanyu Chen
Wanyu Chen
National University of Defense Technology
Information Retrieval: Recommender System
F
Fei Cai
National Key Laboratory of Information Systems Engineering, National University of Defense Technology, Changsha, China
Honghui Chen
Honghui Chen
Professor of Finance, University of Central Florida
FinanceInvestments