🤖 AI Summary
This work addresses key limitations in generative recommendation—namely, the inability of autoregressive decoding to capture global dependencies among multidimensional features, the fixed ordering of user attribute attention, and low inference efficiency—by proposing MDGR, the first framework to integrate diffusion models into generative recommendation. MDGR introduces a parallel codebook architecture, an adaptive masking strategy across both temporal and sample dimensions, and a two-stage parallel decoding mechanism, which collectively enhance personalized representation while significantly accelerating inference. Extensive experiments demonstrate that MDGR outperforms ten state-of-the-art methods by up to 10.78% on average across multiple public and industrial datasets. Furthermore, its deployment on a live advertising platform yielded a 1.20% increase in revenue.
📝 Abstract
Generative recommendation (GR) typically first quantizes continuous item embeddings into multi-level semantic IDs (SIDs), and then generates the next item via autoregressive decoding. Although existing methods are already competitive in terms of recommendation performance, directly inheriting the autoregressive decoding paradigm from language models still suffers from three key limitations: (1) autoregressive decoding struggles to jointly capture global dependencies among the multi-dimensional features associated with different positions of SID; (2) using a unified, fixed decoding path for the same item implicitly assumes that all users attend to item attributes in the same order; (3) autoregressive decoding is inefficient at inference time and struggles to meet real-time requirements. To tackle these challenges, we propose MDGR, a Masked Diffusion Generative Recommendation framework that reshapes the GR pipeline from three perspectives: codebook, training, and inference. (1) We adopt a parallel codebook to provide a structural foundation for diffusion-based GR. (2) During training, we adaptively construct masking supervision signals along both the temporal and sample dimensions. (3) During inference, we develop a warm-up-based two-stage parallel decoding strategy for efficient generation of SIDs. Extensive experiments on multiple public and industrial-scale datasets show that MDGR outperforms ten state-of-the-art baselines by up to 10.78%. Furthermore, by deploying MDGR on a large-scale online advertising platform, we achieve a 1.20% increase in revenue, demonstrating its practical value.