ECom-Bench: Can LLM Agent Resolve Real-World E-commerce Customer Support Issues?

📅 2025-07-07

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Existing large language model (LLM)-based agents lack systematic, multimodal evaluation frameworks tailored to real-world e-commerce customer service scenarios. Method: We introduce EC-Bench—the first multimodal agent benchmark specifically designed for e-commerce customer service—constructed from millions of real-world dialogues. It features a user-profile-driven dynamic simulation mechanism and a high-difficulty composite task suite encompassing cross-modal understanding, multi-turn reasoning, and real-time decision-making. Contribution/Results: EC-Bench significantly enhances evaluation authenticity and challenge. Experiments reveal that state-of-the-art multimodal models (e.g., GPT-4o) achieve only 10–20% pass@3 accuracy, exposing critical gaps in operational capability. This work shifts e-commerce AI agent evaluation from isolated skill assessment toward end-to-end problem-solving performance. The benchmark code and data will be publicly released.

Technology Category

Application Category

📝 Abstract

In this paper, we introduce ECom-Bench, the first benchmark framework for evaluating LLM agent with multimodal capabilities in the e-commerce customer support domain. ECom-Bench features dynamic user simulation based on persona information collected from real e-commerce customer interactions and a realistic task dataset derived from authentic e-commerce dialogues. These tasks, covering a wide range of business scenarios, are designed to reflect real-world complexities, making ECom-Bench highly challenging. For instance, even advanced models like GPT-4o achieve only a 10-20% pass^3 metric in our benchmark, highlighting the substantial difficulties posed by complex e-commerce scenarios. Upon publication, the code and data will be open-sourced to facilitate further research and development in this domain.

Problem

Research questions and friction points this paper is trying to address.

Evaluating LLM agents in e-commerce customer support

Assessing multimodal capabilities in real-world scenarios

Addressing complex e-commerce business challenges

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multimodal LLM agent for e-commerce support

Dynamic user simulation with real personas

Open-source benchmark with complex tasks

🔎 Similar Papers

No similar papers found.

Authors to Follow