ECom-Bench: Can LLM Agent Resolve Real-World E-commerce Customer Support Issues?

📅 2025-07-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing large language model (LLM)-based agents lack systematic, multimodal evaluation frameworks tailored to real-world e-commerce customer service scenarios. Method: We introduce EC-Bench—the first multimodal agent benchmark specifically designed for e-commerce customer service—constructed from millions of real-world dialogues. It features a user-profile-driven dynamic simulation mechanism and a high-difficulty composite task suite encompassing cross-modal understanding, multi-turn reasoning, and real-time decision-making. Contribution/Results: EC-Bench significantly enhances evaluation authenticity and challenge. Experiments reveal that state-of-the-art multimodal models (e.g., GPT-4o) achieve only 10–20% pass@3 accuracy, exposing critical gaps in operational capability. This work shifts e-commerce AI agent evaluation from isolated skill assessment toward end-to-end problem-solving performance. The benchmark code and data will be publicly released.

Technology Category

Application Category

📝 Abstract
In this paper, we introduce ECom-Bench, the first benchmark framework for evaluating LLM agent with multimodal capabilities in the e-commerce customer support domain. ECom-Bench features dynamic user simulation based on persona information collected from real e-commerce customer interactions and a realistic task dataset derived from authentic e-commerce dialogues. These tasks, covering a wide range of business scenarios, are designed to reflect real-world complexities, making ECom-Bench highly challenging. For instance, even advanced models like GPT-4o achieve only a 10-20% pass^3 metric in our benchmark, highlighting the substantial difficulties posed by complex e-commerce scenarios. Upon publication, the code and data will be open-sourced to facilitate further research and development in this domain.
Problem

Research questions and friction points this paper is trying to address.

Evaluating LLM agents in e-commerce customer support
Assessing multimodal capabilities in real-world scenarios
Addressing complex e-commerce business challenges
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multimodal LLM agent for e-commerce support
Dynamic user simulation with real personas
Open-source benchmark with complex tasks
🔎 Similar Papers
No similar papers found.
H
Haoxin Wang
Xiaoduo AI Lab, Shanghai Jiao Tong University
X
Xianhan Peng
Xiaoduo AI Lab, Shanghai Jiao Tong University
X
Xucheng Huang
Xiaoduo AI Lab
Y
Yizhe Huang
Xiaoduo AI Lab
Ming Gong
Ming Gong
Key laboratory of quantum information, USTC
quantum informationquantum dottopological quantum phase transitionultracold atomsFFLO
C
Chenghan Yang
Xiaoduo AI Lab, Shanghai Jiao Tong University
Y
Yang Liu
Xiaoduo AI Lab
L
Ling Jiang
Xiaoduo AI Lab