Shopping Companion: A Memory-Augmented LLM Agent for Real-World E-Commerce Tasks

📅 2026-03-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge that existing large language models struggle to capture users’ long-term preferences in e-commerce scenarios and lack a unified end-to-end optimization framework and evaluation benchmark. To this end, we propose Shopping Companion, a lightweight, memory-augmented agent architecture that integrates long-term preference modeling with shopping assistance tasks while supporting user intervention. The framework employs a dual-reward reinforcement learning strategy incorporating tool-level rewards to effectively handle sparse and discontinuous reward signals in multi-turn interactions. Evaluated on the first e-commerce benchmark specifically designed for long-term preference modeling—comprising over one million real product records—our method significantly outperforms strong baselines. Notably, even advanced models such as GPT-5 achieve less than 70% success on this benchmark, underscoring both the task’s difficulty and the efficacy of our approach.

Technology Category

Application Category

📝 Abstract
In e-commerce, LLM agents show promise for shopping tasks such as recommendations, budgeting, and bundle deals, where accurately capturing user preferences from long-term conversations is critical. However, two challenges hinder realizing this potential: (1) the absence of benchmarks for evaluating long-term preference-aware shopping tasks, and (2) the lack of end-to-end optimization due to existing designs that treat preference identification and shopping assistance as separate components. In this paper, we introduce a novel benchmark with a long-term memory setup, spanning two shopping tasks over 1.2 million real-world products, and propose Shopping Companion, a unified framework that jointly tackles memory retrieval and shopping assistance while supporting user intervention. To train such capabilities, we develop a dual-reward reinforcement learning strategy with tool-wise rewards to handle the sparse and discontinuous rewards inherent in multi-turn interactions. Experimental results demonstrate that even state-of-the-art models (such as GPT-5) achieve success rates under 70% on our benchmark, highlighting the significant challenges in this domain. Notably, our lightweight LLM, trained with Shopping Companion, consistently outperforms strong baselines, achieving better preference capture and task performance, which validates the effectiveness of our unified design.
Problem

Research questions and friction points this paper is trying to address.

e-commerce
LLM agent
long-term preference
benchmark
shopping tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

memory-augmented LLM
unified framework
long-term preference
dual-reward reinforcement learning
e-commerce benchmark
🔎 Similar Papers
No similar papers found.
Z
Zijian Yu
Alibaba International Digital Commercial Group
K
Kejun Xiao
Alibaba International Digital Commercial Group
Huaipeng Zhao
Huaipeng Zhao
Alibaba Inc
natural language processingMachine Learning
T
Tao Luo
Alibaba International Digital Commercial Group
X
Xiaoyi Zeng
Alibaba International Digital Commercial Group