MSCRS: Multi-modal Semantic Graph Prompt Learning Framework for Conversational Recommender Systems

📅 2025-04-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenge of insufficient user preference modeling in conversational recommender systems (CRS) caused by short and sparse dialogue contexts, this paper proposes a multimodal semantic modeling approach that jointly leverages textual and visual modalities. Specifically, we construct modality-specific semantic graphs and—novelly—integrate multimodal graph-structured modeling with large language model (LLM) prompt learning. User preference representation is enhanced through high-order collaborative, textual, and visual modality associations. Our method comprises four components: multimodal feature extraction, modality-specific graph neural networks, cross-modal graph alignment and fusion, and prompt-based fine-tuning. Extensive experiments demonstrate significant improvements across multiple benchmarks: Recall@10 increases by 12.6%, BLEU-4 by 9.3%, and BERTScore by 8.1%. To foster reproducibility and further research, we publicly release both our source code and an extended multimodal CRS benchmark dataset.

Technology Category

Application Category

📝 Abstract
Conversational Recommender Systems (CRSs) aim to provide personalized recommendations by interacting with users through conversations. Most existing studies of CRS focus on extracting user preferences from conversational contexts. However, due to the short and sparse nature of conversational contexts, it is difficult to fully capture user preferences by conversational contexts only. We argue that multi-modal semantic information can enrich user preference expressions from diverse dimensions (e.g., a user preference for a certain movie may stem from its magnificent visual effects and compelling storyline). In this paper, we propose a multi-modal semantic graph prompt learning framework for CRS, named MSCRS. First, we extract textual and image features of items mentioned in the conversational contexts. Second, we capture higher-order semantic associations within different semantic modalities (collaborative, textual, and image) by constructing modality-specific graph structures. Finally, we propose an innovative integration of multi-modal semantic graphs with prompt learning, harnessing the power of large language models to comprehensively explore high-dimensional semantic relationships. Experimental results demonstrate that our proposed method significantly improves accuracy in item recommendation, as well as generates more natural and contextually relevant content in response generation. We have released the code and the expanded multi-modal CRS datasets to facilitate further exploration in related researchfootnote{https://github.com/BIAOBIAO12138/MSCRS-main}.
Problem

Research questions and friction points this paper is trying to address.

Enhance user preference capture in CRS using multi-modal data
Integrate semantic graphs with prompt learning for better recommendations
Improve recommendation accuracy and response relevance in CRS
Innovation

Methods, ideas, or system contributions that make the work stand out.

Extracts textual and image features from items
Constructs modality-specific graph structures for semantics
Integrates multi-modal graphs with prompt learning
🔎 Similar Papers
No similar papers found.
Y
Yibiao Wei
University of Electronic Science and Technology of China, Chengdu, Sichuan, China
J
Jie Zou
University of Electronic Science and Technology of China, Chengdu, Sichuan, China
Weikang Guo
Weikang Guo
Ghent University, KTH Royal Institute of Technology
Artificial IntelligenceOperations ResearchOptimizationProject Scheduling
G
Guoqing Wang
University of Electronic Science and Technology of China, Chengdu, Sichuan, China
X
Xing Xu
University of Electronic Science and Technology of China, Chengdu, Sichuan, China
Y
Yang Yang
University of Electronic Science and Technology of China, Chengdu, Sichuan, China