🤖 AI Summary
In e-commerce customer service scenarios, large language model (LLM) agents suffer from memory decay, repetitive errors, and a lack of mechanisms for continuous self-improvement. Method: This paper proposes a lightweight, plug-and-play memory layer centered on a shared, structured strategy-reflection memory bank. It enables cross-session experience reuse without model fine-tuning; reflection is distilled through multi-turn interaction, and historical strategies are dynamically retrieved via vector search to guide real-time decision-making. Contribution/Results: Experiments demonstrate a 63-percentage-point improvement in task success rate. The method significantly enhances decision stability and consistency across repeated trials, offering an efficient and practical solution for ensuring the long-term reliability of LLM agents in dynamic service environments.
📝 Abstract
Large Language Model-based agents(LLM-based agents) are increasingly deployed in customer service, yet they often forget across sessions, repeat errors, and lack mechanisms for continual self-improvement. This makes them unreliable in dynamic settings where stability and consistency are critical. To better evaluate these properties, we emphasize two indicators: task success rate as a measure of overall effectiveness, and consistency metrics such as Pass$^k$ to capture reliability across multiple trials. To address the limitations of existing approaches, we propose MemOrb, a lightweight and plug-and-play verbal reinforcement memory layer that distills multi-turn interactions into compact strategy reflections. These reflections are stored in a shared memory bank and retrieved to guide decision-making, without requiring any fine-tuning. Experiments show that MemOrb significantly improves both success rate and stability, achieving up to a 63 percentage-point gain in multi-turn success rate and delivering more consistent performance across repeated trials. Our results demonstrate that structured reflection is a powerful mechanism for enhancing long-term reliability of frozen LLM agents in customer service scenarios.