🤖 AI Summary
Cross-domain sequential recommendation (CDSR) faces the key challenge of jointly modeling intra-sequence and inter-sequence item interactions to accurately capture users’ dynamic cross-domain preferences. To address this, we propose the first CDSR framework incorporating a frozen CLIP image encoder, introducing an image-enhanced multi-attention mechanism. Specifically, fine-grained visual semantics of items are captured via CLIP’s visual embeddings; image–text feature fusion enables cross-domain visual semantic alignment; and a hierarchical cross-domain attention module unifies modeling of intra-sequence local dependencies and inter-sequence cross-domain associations. Extensive experiments on four reconstructed e-commerce datasets demonstrate that our method significantly outperforms state-of-the-art approaches, achieving a 12.6% improvement in Recall@10. These results validate both the effectiveness and advancement of leveraging visual information for modeling cross-domain user preferences.
📝 Abstract
Cross-Domain Sequential Recommendation (CDSR) aims to predict future user interactions based on historical interactions across multiple domains. The key challenge in CDSR is effectively capturing cross-domain user preferences by fully leveraging both intra-sequence and inter-sequence item interactions. In this paper, we propose a novel method, Image Fusion for Cross-Domain Sequential Recommendation (IFCDSR), which incorporates item image information to better capture visual preferences. Our approach integrates a frozen CLIP model to generate image embeddings, enriching original item embeddings with visual data from both intra-sequence and inter-sequence interactions. Additionally, we employ a multiple attention layer to capture cross-domain interests, enabling joint learning of single-domain and cross-domain user preferences. To validate the effectiveness of IFCDSR, we re-partitioned four e-commerce datasets and conducted extensive experiments. Results demonstrate that IFCDSR significantly outperforms existing methods.