FOSTER: First-order Dataset Distillation for Text-based Sequential Recommendation

📅 2026-05-28

📈 Citations: 0

✨ Influential: 0

career value

197K/year

🤖 AI Summary

This work addresses the high training cost of sequential recommendation systems, which is exacerbated by the discrete item space and the dual-level optimization challenges introduced by language model encoders. To overcome these issues, the authors propose an efficient first-order optimization-based distillation framework. It reduces embedding computation overhead through stochastic sampling of item subsets, stabilizes the optimization process via trajectory-anchored parameter resetting, and enhances the quality of synthetic sequences with a semantic similarity-aware co-occurrence regularizer. Experiments on three benchmark datasets demonstrate that the method achieves performance comparable to full-data training using only 20 synthetic interaction sequences, significantly outperforming existing dataset distillation and coreset approaches.

📝 Abstract

Text-based sequential recommender systems, while greatly improving recommendation accuracy by incorporating item contexts, are undeniably more expensive to train. By condensing a large dataset into a compact set of synthetic samples for model training, dataset distillation offers a promising solution. However, its adoption in text-based sequential recommendation is non-trivial given the large pool of discrete items. This challenge is further compounded by language model-based item encoding, which makes bi-level optimization commonly used in dataset distillation prohibitively expensive. To this end, we propose First-order dataset distillation for Text-based Sequential Recommendation (FOSTER), which facilitates effectiveness and efficiency via three novel components: (1) stochastic item subset sampling that replaces costly full-corpus embedding extraction at each distillation step; (2) first-order optimization with trajectory-anchored parameter reset to avoid expensive bi-level gradient computation; and (3) regularization that explicitly promotes co-occurrence between semantically similar items in the synthetic sequences. Extensive experiments on three benchmarks show that FOSTER consistently outperforms existing dataset distillation and coreset selection baselines, approximating full-dataset performance using as few as 20 synthetic interaction sequences.

Problem

Research questions and friction points this paper is trying to address.

dataset distillation

text-based sequential recommendation

large-scale discrete items

bi-level optimization

training efficiency

Innovation

Methods, ideas, or system contributions that make the work stand out.

dataset distillation

text-based sequential recommendation

first-order optimization