MARS: Multi-rate Aggregation of Recency Signals for Sequential Recommendation across Sparse and Dense Regimes

📅 2026-06-02

📈 Citations: 0

✨ Influential: 0

career value

174K/year

🤖 AI Summary

Existing sequential recommendation methods struggle to explicitly model multi-scale temporal dynamics in user behavior, leading to performance limitations under both sparse and dense data regimes. This work proposes MARS—an encoder-agnostic aggregation operator that constructs multi-granularity recency summaries using real timestamps and fuses them via context-adaptive gating. Depending on the training sequence length, MARS automatically selects between MARS-T (Transformer-based) and MARS-M (Mamba-based) architectures. MARS is the first to explicitly capture multi-scale recency signals and unifies sparse and dense scenarios through a dual instance-adaptive mechanism. It achieves consistent state-of-the-art HR@10 across five benchmarks, yielding up to a 36.2% relative improvement on sparse data and surpassing SIGMA with 42% less computation on dense data, thereby dominating the accuracy–efficiency Pareto frontier.

📝 Abstract

Sequential recommenders weight historical interactions either through positional self-attention as in Transformers or through a single implicit decay schedule as in State-Space Models. Neither makes the multi-scale temporal structure of real user behaviour explicit. We propose MARS, an encoder-agnostic aggregation operator that consumes real timestamps and produces K summaries emphasising distinct recency scales, fused by a context-adaptive gate. MARS adds at most 6% parameters and runs in $\mathcal{O}(LdK)$ time. MARS adapts to data density by automatically selecting between two encoder instantiations: MARS-T (Transformer) for sparse data and MARS-M (Mamba) for dense data, based on the average sequence length of the training set. On five public benchmarks against ten Transformer- and Mamba-based baselines under a unified RecBole protocol, MARS attains the best HR@10 on every benchmark, with mean relative gain +19.7% over the strongest content-only Transformer baseline on sparse data (reaching +36.2% on Games) and +3.2% HR@10 / +0.9% NDCG over SIGMA on dense ML-1M at 42% fewer MFLOPs, occupying the accuracy-efficiency Pareto frontier across the data-density spectrum. A backbone-only ablation isolates the marginal contribution of MARS at +4% to +19% HR@10 on sparse data and motivates the dual-instantiation design. The code is included in the supplementary material.

Problem

Research questions and friction points this paper is trying to address.

sequential recommendation

recency signals

multi-scale temporal structure

data sparsity

timestamp-aware modeling

Innovation

Methods, ideas, or system contributions that make the work stand out.

multi-scale temporal modeling

recency-aware aggregation

data-density adaptation