MindFlow+: A Self-Evolving Agent for E-Commerce Customer Service

📅 2025-07-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Traditional intent recognition systems in e-commerce customer service exhibit weak contextual understanding and poor adaptability in multi-turn dynamic dialogues. Method: This paper proposes a continuously evolving dialogue agent framework that integrates large language models (LLMs), imitation learning, and offline reinforcement learning. It introduces tool-augmented demonstration construction, reward-conditioned data modeling, and an AI contribution quantification mechanism, combined with ReAct-style tool invocation and domain knowledge enhancement to enable task-driven response generation and policy optimization. Contribution/Results: Evaluated on real-world e-commerce dialogue data, the framework achieves significant improvements in context relevance (+12.3%), adaptability (+15.7%), and task accuracy (+18.1%), demonstrating its effectiveness and scalability in complex service scenarios.

Technology Category

Application Category

📝 Abstract
High-quality dialogue is crucial for e-commerce customer service, yet traditional intent-based systems struggle with dynamic, multi-turn interactions. We present MindFlow+, a self-evolving dialogue agent that learns domain-specific behavior by combining large language models (LLMs) with imitation learning and offline reinforcement learning (RL). MindFlow+ introduces two data-centric mechanisms to guide learning: tool-augmented demonstration construction, which exposes the model to knowledge-enhanced and agentic (ReAct-style) interactions for effective tool use; and reward-conditioned data modeling, which aligns responses with task-specific goals using reward signals. To evaluate the model's role in response generation, we introduce the AI Contribution Ratio, a novel metric quantifying AI involvement in dialogue. Experiments on real-world e-commerce conversations show that MindFlow+ outperforms strong baselines in contextual relevance, flexibility, and task accuracy. These results demonstrate the potential of combining LLMs tool reasoning, and reward-guided learning to build domain-specialized, context-aware dialogue systems.
Problem

Research questions and friction points this paper is trying to address.

Enhancing dynamic multi-turn e-commerce customer service dialogues
Combining LLMs with imitation and offline RL for domain-specific learning
Measuring AI involvement in dialogues with a novel contribution metric
Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines LLMs with imitation and offline RL
Uses tool-augmented demonstration construction
Implements reward-conditioned data modeling
Ming Gong
Ming Gong
Key laboratory of quantum information, USTC
quantum informationquantum dottopological quantum phase transitionultracold atomsFFLO
X
Xucheng Huang
Xiaoduo AI Lab, Shanghai, China
Z
Ziheng Xu
Xiaoduo AI Lab, Shanghai, China
V
Vijayan K. Asari
University of Dayton, Dayton, Ohio, United States