MSCRS: Multi-modal Semantic Graph Prompt Learning Framework for Conversational Recommender Systems

📅 2025-04-15

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

To address the challenge of insufficient user preference modeling in conversational recommender systems (CRS) caused by short and sparse dialogue contexts, this paper proposes a multimodal semantic modeling approach that jointly leverages textual and visual modalities. Specifically, we construct modality-specific semantic graphs and—novelly—integrate multimodal graph-structured modeling with large language model (LLM) prompt learning. User preference representation is enhanced through high-order collaborative, textual, and visual modality associations. Our method comprises four components: multimodal feature extraction, modality-specific graph neural networks, cross-modal graph alignment and fusion, and prompt-based fine-tuning. Extensive experiments demonstrate significant improvements across multiple benchmarks: Recall@10 increases by 12.6%, BLEU-4 by 9.3%, and BERTScore by 8.1%. To foster reproducibility and further research, we publicly release both our source code and an extended multimodal CRS benchmark dataset.

Technology Category

Application Category

📝 Abstract

Conversational Recommender Systems (CRSs) aim to provide personalized recommendations by interacting with users through conversations. Most existing studies of CRS focus on extracting user preferences from conversational contexts. However, due to the short and sparse nature of conversational contexts, it is difficult to fully capture user preferences by conversational contexts only. We argue that multi-modal semantic information can enrich user preference expressions from diverse dimensions (e.g., a user preference for a certain movie may stem from its magnificent visual effects and compelling storyline). In this paper, we propose a multi-modal semantic graph prompt learning framework for CRS, named MSCRS. First, we extract textual and image features of items mentioned in the conversational contexts. Second, we capture higher-order semantic associations within different semantic modalities (collaborative, textual, and image) by constructing modality-specific graph structures. Finally, we propose an innovative integration of multi-modal semantic graphs with prompt learning, harnessing the power of large language models to comprehensively explore high-dimensional semantic relationships. Experimental results demonstrate that our proposed method significantly improves accuracy in item recommendation, as well as generates more natural and contextually relevant content in response generation. We have released the code and the expanded multi-modal CRS datasets to facilitate further exploration in related researchfootnote{https://github.com/BIAOBIAO12138/MSCRS-main}.

Problem

Research questions and friction points this paper is trying to address.

Enhance user preference capture in CRS using multi-modal data

Integrate semantic graphs with prompt learning for better recommendations

Improve recommendation accuracy and response relevance in CRS

Innovation

Methods, ideas, or system contributions that make the work stand out.

Extracts textual and image features from items

Constructs modality-specific graph structures for semantics

Integrates multi-modal graphs with prompt learning

🔎 Similar Papers

No similar papers found.

Authors to Follow