MindFlow: Revolutionizing E-commerce Customer Support with Multimodal LLM Agents

📅 2025-07-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) exhibit limited reasoning capabilities in complex e-commerce customer service scenarios involving multimodal inputs (e.g., image-text queries). Method: This paper proposes a modular multimodal large language model-as-tool (MLLM-as-Tool) framework built upon the CoALA architecture. It innovatively unifies vision-language joint understanding, tool invocation, memory management, and autonomous decision-making within a single agent system, enabling end-to-end multimodal interactive reasoning. Contribution/Results: As the first open-source, e-commerce–specific multimodal LLM agent, it demonstrates substantial improvements in online A/B testing and ablation studies: a 93.53% increase in complex query resolution rate, significant gains in user satisfaction, and measurable reductions in operational costs.

Technology Category

Application Category

📝 Abstract
Recent advances in large language models (LLMs) have enabled new applications in e-commerce customer service. However, their capabilities remain constrained in complex, multimodal scenarios. We present MindFlow, the first open-source multimodal LLM agent tailored for e-commerce. Built on the CoALA framework, it integrates memory, decision-making, and action modules, and adopts a modular "MLLM-as-Tool" strategy for effect visual-textual reasoning. Evaluated via online A/B testing and simulation-based ablation, MindFlow demonstrates substantial gains in handling complex queries, improving user satisfaction, and reducing operational costs, with a 93.53% relative improvement observed in real-world deployments.
Problem

Research questions and friction points this paper is trying to address.

Handling complex multimodal e-commerce queries
Improving user satisfaction in customer support
Reducing operational costs with AI agents
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multimodal LLM agent for e-commerce
Modular MLLM-as-Tool strategy
Integrates memory, decision, action modules
🔎 Similar Papers
No similar papers found.
Ming Gong
Ming Gong
Key laboratory of quantum information, USTC
quantum informationquantum dottopological quantum phase transitionultracold atomsFFLO
X
Xucheng Huang
Xiaoduo AI
C
Chenghan Yang
Xiaoduo AI
X
Xianhan Peng
Xiaoduo AI
H
Haoxin Wang
Xiaoduo AI
Y
Yang Liu
Xiaoduo AI
L
Ling Jiang
Xiaoduo AI