TADT-CSA: Temporal Advantage Decision Transformer with Contrastive State Abstraction for Generative Recommendation

📅 2025-07-27

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Decision Transformers (DTs) face two key challenges in sequential recommendation: difficulty in trajectory stitching and inefficient high-dimensional user state modeling. To address these, we propose AdvDT—a Decision Transformer integrating temporal advantage signals with contrastive state abstraction. Our method introduces three core innovations: (1) joint modeling of Return-to-Go and temporal advantage to enhance long-horizon reward awareness and trend sensitivity; (2) a task-adaptive (TA) state vector quantization mechanism that compresses the state space while preserving task-relevant semantics; and (3) a contrastive state abstraction module that jointly optimizes reward prediction and state transition modeling. Extensive experiments on multiple public benchmarks and a real-world online system demonstrate that AdvDT significantly outperforms state-of-the-art DT baselines, achieving average improvements of 12.6% in Recall@10 and 9.8% in NDCG@10. These results validate AdvDT’s effectiveness in both high-fidelity trajectory generation and compact, discriminative state representation learning.

Technology Category

Application Category

📝 Abstract

With the rapid advancement of Transformer-based Large Language Models (LLMs), generative recommendation has shown great potential in enhancing both the accuracy and semantic understanding of modern recommender systems. Compared to LLMs, the Decision Transformer (DT) is a lightweight generative model applied to sequential recommendation tasks. However, DT faces challenges in trajectory stitching, often producing suboptimal trajectories. Moreover, due to the high dimensionality of user states and the vast state space inherent in recommendation scenarios, DT can incur significant computational costs and struggle to learn effective state representations. To overcome these issues, we propose a novel Temporal Advantage Decision Transformer with Contrastive State Abstraction (TADT-CSA) model. Specifically, we combine the conventional Return-To-Go (RTG) signal with a novel temporal advantage (TA) signal that encourages the model to capture both long-term returns and their sequential trend. Furthermore, we integrate a contrastive state abstraction module into the DT framework to learn more effective and expressive state representations. Within this module, we introduce a TA-conditioned State Vector Quantization (TAC-SVQ) strategy, where the TA score guides the state codebooks to incorporate contextual token information. Additionally, a reward prediction network and a contrastive transition prediction (CTP) network are employed to ensure the state codebook preserves both the reward information of the current state and the transition information between adjacent states. Empirical results on both public datasets and an online recommendation system demonstrate the effectiveness of the TADT-CSA model and its superiority over baseline methods.

Problem

Research questions and friction points this paper is trying to address.

Improves trajectory stitching in Decision Transformer for recommendations

Reduces computational costs in high-dimensional user state spaces

Enhances state representation learning with contrastive abstraction

Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines RTG with temporal advantage signal

Integrates contrastive state abstraction module

Uses TA-conditioned State Vector Quantization

🔎 Similar Papers

No similar papers found.

Authors to Follow