🤖 AI Summary
This work addresses the limitation of current large language models (LLMs), which, constrained by fixed context windows, struggle to effectively model ultra-long user shopping trajectories spanning multiple years. To overcome this, the authors propose a tool-augmented customer agent framework that leverages external memory to store trajectory data and employs code interpreters—such as SQL—for autonomous retrieval and parsing. The framework is trained using reinforcement learning with a verifiable reward mechanism (RLVR). This approach circumvents LLMs’ context-length constraints and introduces ShopTrajQA, the first benchmark supporting long-trajectory evaluation at 32k–64k tokens. Experiments demonstrate that the proposed framework significantly outperforms existing LLMs on ShopTrajQA and exhibits strong generalization capabilities on other complex reasoning tasks.
📝 Abstract
Understanding customer shopping trajectories is essential for enabling personalized shopping experiences. However, shopping records (i.e., customer's search, clicks, purchases, etc.) often span long time horizons over multiple years, resulting in extremely long trajectories that pose significant challenges for existing large language models (LLMs). Despite the importance of this problem, existing benchmarks are limited to short customer trajectories, while real-world trajectories from large e-commerce platforms are rarely accessible due to data privacy constraints. To address this gap, we introduce ShopTrajQA, a long-context evaluation benchmark constructed from real-world product information and simulated shopping trajectories. The dataset includes variants of up to 32k and 64k tokens, enabling systematic evaluation of model robustness under varying context lengths. Through comprehensive benchmarking of frontier LLMs, we identify critical performance gaps in reasoning over long shopping trajectory data. To address these challenges, we propose a Customer Agent Framework for ultra-long context management. Leveraging a Reinforcement Learning with Verifiable Rewards (RLVR) agentic training paradigm, our approach stores trajectories as external local files and trains the agent to autonomously retrieve and parse them through code-interpreter interactions (e.g., SQL queries), effectively bypassing the fixed in-context window constraints of LLMs. Experimental results demonstrate that our framework achieves strong performance for ShopTrajQA and shows generalization to other complex reasoning tasks.