Listwise Preference Diffusion Optimization for User Behavior Trajectories Prediction

📅 2025-11-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Conventional sequential recommendation methods struggle to capture global, list-level dependencies within user behavior sequences, leading to inaccurate multi-step trajectory predictions. Method: This paper introduces the novel task of “user behavioral trajectory prediction” and pioneers the application of diffusion models to structured preference learning. We propose a list-level preference diffusion optimization framework, using the Plackett–Luce distribution as a supervision signal and deriving a variational lower bound on list-wise ranking likelihood—thereby relaxing the independence assumptions inherent in autoregressive approaches. Contribution/Results: We design a dual-metric evaluation protocol—SeqMatch and PPL—and demonstrate significant improvements over state-of-the-art methods across multiple real-world benchmarks. Our results empirically validate the effectiveness of modeling long-horizon, structured user preferences and establish diffusion models as a new paradigm for list-level sequential recommendation.

Technology Category

Application Category

📝 Abstract
Forecasting multi-step user behavior trajectories requires reasoning over structured preferences across future actions, a challenge overlooked by traditional sequential recommendation. This problem is critical for applications such as personalized commerce and adaptive content delivery, where anticipating a user's complete action sequence enhances both satisfaction and business outcomes. We identify an essential limitation of existing paradigms: their inability to capture global, listwise dependencies among sequence items. To address this, we formulate User Behavior Trajectory Prediction (UBTP) as a new task setting that explicitly models long-term user preferences. We introduce Listwise Preference Diffusion Optimization (LPDO), a diffusion-based training framework that directly optimizes structured preferences over entire item sequences. LPDO incorporates a Plackett-Luce supervision signal and derives a tight variational lower bound aligned with listwise ranking likelihoods, enabling coherent preference generation across denoising steps and overcoming the independent-token assumption of prior diffusion methods. To rigorously evaluate multi-step prediction quality, we propose the task-specific metric Sequential Match (SeqMatch), which measures exact trajectory agreement, and adopt Perplexity (PPL), which assesses probabilistic fidelity. Extensive experiments on real-world user behavior benchmarks demonstrate that LPDO consistently outperforms state-of-the-art baselines, establishing a new benchmark for structured preference learning with diffusion models.
Problem

Research questions and friction points this paper is trying to address.

Predicting multi-step user behavior trajectories with structured preferences
Overcoming limitations in capturing global listwise dependencies among items
Optimizing coherent preference generation across entire action sequences
Innovation

Methods, ideas, or system contributions that make the work stand out.

Diffusion-based training optimizes structured preferences over sequences
Plackett-Luce supervision aligns with listwise ranking likelihoods
Tight variational bound enables coherent preference generation steps
🔎 Similar Papers
No similar papers found.