π€ AI Summary
This work proposes MuSTRec, a novel framework that addresses the insufficient integration of multimodal information and user behavior sequences in sequential recommendation. MuSTRec is the first to deeply incorporate multimodal semantics into sequential recommendation by constructing an item-item graph based on textual and visual features and designing a frequency-aware self-attention mechanism within a Transformer architecture to jointly model collaborative filtering signals and usersβ long- and short-term preferences. The framework innovatively introduces a multimodal-driven data partitioning strategy and reveals that user embeddings significantly enhance short-term performance in low-data regimes. Extensive experiments demonstrate that MuSTRec outperforms state-of-the-art methods across multiple Amazon datasets, achieving up to a 33.5% improvement in overall performance and up to a 200% gain in short-term metrics on small-scale datasets.
π Abstract
We propose a novel recommender framework, MuSTRec (Multimodal and Sequential Transformer-based Recommendation), that unifies multimodal and sequential recommendation paradigms. MuSTRec captures cross-item similarities and collaborative filtering signals, by building item-item graphs from extracted text and visual features. A frequency-based self-attention module additionally captures the short- and long-term user preferences. Across multiple Amazon datasets, MuSTRec demonstrates superior performance (up to 33.5% improvement) over multimodal and sequential state-of-the-art baselines. Finally, we detail some interesting facets of this new recommendation paradigm. These include the need for a new data partitioning regime, and a demonstration of how integrating user embeddings into sequential recommendation leads to drastically increased short-term metrics (up to 200% improvement) on smaller datasets. Our code is availabe at https://anonymous.4open.science/r/MuSTRec-D32B/ and will be made publicly available.