🤖 AI Summary
Conventional sequential recommendation methods struggle to capture global, list-level dependencies within user behavior sequences, leading to inaccurate multi-step trajectory predictions.
Method: This paper introduces the novel task of “user behavioral trajectory prediction” and pioneers the application of diffusion models to structured preference learning. We propose a list-level preference diffusion optimization framework, using the Plackett–Luce distribution as a supervision signal and deriving a variational lower bound on list-wise ranking likelihood—thereby relaxing the independence assumptions inherent in autoregressive approaches.
Contribution/Results: We design a dual-metric evaluation protocol—SeqMatch and PPL—and demonstrate significant improvements over state-of-the-art methods across multiple real-world benchmarks. Our results empirically validate the effectiveness of modeling long-horizon, structured user preferences and establish diffusion models as a new paradigm for list-level sequential recommendation.
📝 Abstract
Forecasting multi-step user behavior trajectories requires reasoning over structured preferences across future actions, a challenge overlooked by traditional sequential recommendation. This problem is critical for applications such as personalized commerce and adaptive content delivery, where anticipating a user's complete action sequence enhances both satisfaction and business outcomes. We identify an essential limitation of existing paradigms: their inability to capture global, listwise dependencies among sequence items. To address this, we formulate User Behavior Trajectory Prediction (UBTP) as a new task setting that explicitly models long-term user preferences. We introduce Listwise Preference Diffusion Optimization (LPDO), a diffusion-based training framework that directly optimizes structured preferences over entire item sequences. LPDO incorporates a Plackett-Luce supervision signal and derives a tight variational lower bound aligned with listwise ranking likelihoods, enabling coherent preference generation across denoising steps and overcoming the independent-token assumption of prior diffusion methods. To rigorously evaluate multi-step prediction quality, we propose the task-specific metric Sequential Match (SeqMatch), which measures exact trajectory agreement, and adopt Perplexity (PPL), which assesses probabilistic fidelity. Extensive experiments on real-world user behavior benchmarks demonstrate that LPDO consistently outperforms state-of-the-art baselines, establishing a new benchmark for structured preference learning with diffusion models.