Intent Representation Learning with Large Language Model for Recommendation

📅 2025-02-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenges of fine-grained multimodal intent modeling, insufficient cross-modal alignment, and severe noise interference in recommender systems, this paper proposes a large language model (LLM)-based multimodal intent representation framework. Methodologically, we design a dual-tower neural architecture that jointly encodes user behavioral sequences and heterogeneous textual signals (e.g., reviews and item descriptions). We introduce a novel synergistic alignment mechanism combining pairwise contrastive learning and translation-style cross-modal mapping, augmented by momentum-based knowledge distillation to mitigate modality-specific representation heterogeneity and noise. Extensive experiments on three public benchmarks demonstrate significant improvements in recommendation accuracy—achieving an average +3.2% gain in NDCG@10—and enhanced interpretability of learned user intents. The implementation code is publicly available.

Technology Category

Application Category

📝 Abstract
Intent-based recommender systems have garnered significant attention for uncovering latent fine-grained preferences. Intents, as underlying factors of interactions, are crucial for improving recommendation interpretability. Most methods define intents as learnable parameters updated alongside interactions. However, existing frameworks often overlook textual information (e.g., user reviews, item descriptions), which is crucial for alleviating the sparsity of interaction intents. Exploring these multimodal intents, especially the inherent differences in representation spaces, poses two key challenges: i) How to align multimodal intents and effectively mitigate noise issues; ii) How to extract and match latent key intents across modalities. To tackle these challenges, we propose a model-agnostic framework, Intent Representation Learning with Large Language Model (IRLLRec), which leverages large language models (LLMs) to construct multimodal intents and enhance recommendations. Specifically, IRLLRec employs a dual-tower architecture to learn multimodal intent representations. Next, we propose pairwise and translation alignment to eliminate inter-modal differences and enhance robustness against noisy input features. Finally, to better match textual and interaction-based intents, we employ momentum distillation to perform teacher-student learning on fused intent representations. Empirical evaluations on three datasets show that our IRLLRec framework outperforms baselines. The implementation is available at https://github.com/wangyu0627/IRLLRec.
Problem

Research questions and friction points this paper is trying to address.

Aligning multimodal intents effectively
Extracting latent key intents across modalities
Enhancing recommendation interpretability using LLMs
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages large language models
Employs dual-tower architecture
Uses momentum distillation learning
🔎 Similar Papers
No similar papers found.
Y
Yu Wang
Anhui University, Hefei, China
Lei Sang
Lei Sang
Anhui University
Recommender SystemsData Mining
Y
Yi Zhang
Anhui University, Hefei, China
Y
Yiwen Zhang
Anhui University, Hefei, China