🤖 AI Summary
To address factual hallucinations and the lack of verifiable sourcing in Large Language Model (LLM)-based Conversational Shopping Agents (CSAs) for e-commerce, this paper proposes a production-oriented “citation experience” paradigm. Our method integrates In-Context Learning (ICL) with Multi-User Interface (MUI) reasoning, enabling automatic response attribution and clickable knowledge-source rendering—without modifying existing interaction logic. We further design automated evaluation metrics and a scalable benchmark, validated on real-world e-commerce data. Results show a 13.83% improvement in LLM response grounding fidelity, significantly enhancing information transparency, verifiability, and user trust. To our knowledge, this is the first work to systematically introduce a lightweight, deployable citation generation mechanism into e-commerce dialogue systems, offering a practical, trustworthy pathway for production-grade CSAs.
📝 Abstract
With the advancement of conversational large language models (LLMs), several LLM-based Conversational Shopping Agents (CSA) have been developed to help customers answer questions and smooth their shopping journey in e-commerce domain. The primary objective in building a trustworthy CSA is to ensure the agent's responses are accurate and factually grounded, which is essential for building customer trust and encouraging continuous engagement. However, two challenges remain. First, LLMs produce hallucinated or unsupported claims. Such inaccuracies risk spreading misinformation and diminishing customer trust. Second, without providing knowledge source attribution in CSA response, customers struggle to verify LLM-generated information. To address these challenges, we present an easily productionized solution that enables a"citation experience"utilizing In-context Learning (ICL) and Multi-UX-Inference (MUI) to generate responses with citations to attribute its original sources without interfering other existing UX features. With proper UX design, these citation marks can be linked to the related product information and display the source to our customers. In this work, we also build auto-metrics and scalable benchmarks to holistically evaluate LLM's grounding and attribution capabilities. Our experiments demonstrate that incorporating this citation generation paradigm can substantially enhance the grounding of LLM responses by 13.83% on the real-world data. As such, our solution not only addresses the immediate challenges of LLM grounding issues but also adds transparency to conversational AI.