🤖 AI Summary
Existing Transformer-based sequential recommendation systems (SRS) struggle to capture high-frequency, bursty patterns in user interests due to the inherent low-pass filtering property of self-attention. This work introduces discrete wavelet transform (DWT) into SRS for the first time, proposing an adaptive time-frequency filtering framework. It explicitly models both low-frequency long-term preferences and high-frequency short-term interests via learnable multi-scale DWT decomposition, and replaces self-attention with a linear-complexity time-frequency feature fusion encoder. The approach overcomes Transformer’s limitations in frequency-domain modeling, achieving significant improvements over state-of-the-art methods across multiple public benchmarks—especially on long sequences—while accelerating inference by 37% and reducing memory consumption by 42%.
📝 Abstract
Sequential Recommender Systems (SRS) aim to model sequential behaviors of users to capture their interests which usually evolve over time. Transformer-based SRS have achieved distinguished successes recently. However, studies reveal self-attention mechanism in Transformer-based models is essentially a low-pass filter and ignores high frequency information potentially including meaningful user interest patterns. This motivates us to seek better filtering technologies for SRS, and finally we find Discrete Wavelet Transform (DWT), a famous time-frequency analysis technique from digital signal processing field, can effectively process both low-frequency and high-frequency information. We design an adaptive time-frequency filter with DWT technique, which decomposes user interests into multiple signals with different frequency and time, and can automatically learn weights of these signals. Furthermore, we develop DWTRec, a model for sequential recommendation all based on the adaptive time-frequency filter. Thanks to fast DWT technique, DWTRec has a lower time complexity and space complexity theoretically, and is Proficient in modeling long sequences. Experiments show that our model outperforms state-of-the-art baseline models in datasets with different domains, sparsity levels and average sequence lengths. Especially, our model shows great performance increase in contrast with previous models when the sequence grows longer, which demonstrates another advantage of our model.