🤖 AI Summary
Contemporary in-vehicle voice assistants suffer from short-lived and inaccurate user preference retention, alongside insufficient privacy protection and transparency. To address these challenges, this paper proposes a long-term memory system tailored for automotive environments, introducing the first category-constrained memory architecture. It leverages large language models (LLMs) to enable structured preference extraction, category boundary modeling, resolution of redundant/contradictory preferences, and semantic-consistency-aware retrieval—all while ensuring compliance with GDPR and other privacy regulations. Key contributions include: (1) releasing CarMem, the first industrial-scale synthetic multi-turn, multi-session in-vehicle dialogue dataset; (2) achieving dual controllability in both memory maintenance and retrieval; and (3) attaining preference extraction F1 scores of 0.78–0.95, reducing redundant and contradictory preferences by 95% and 92%, respectively, with peak retrieval accuracy reaching 0.87.
📝 Abstract
In today's assistant landscape, personalisation enhances interactions, fosters long-term relationships, and deepens engagement. However, many systems struggle with retaining user preferences, leading to repetitive user requests and disengagement. Furthermore, the unregulated and opaque extraction of user preferences in industry applications raises significant concerns about privacy and trust, especially in regions with stringent regulations like Europe. In response to these challenges, we propose a long-term memory system for voice assistants, structured around predefined categories. This approach leverages Large Language Models to efficiently extract, store, and retrieve preferences within these categories, ensuring both personalisation and transparency. We also introduce a synthetic multi-turn, multi-session conversation dataset (CarMem), grounded in real industry data, tailored to an in-car voice assistant setting. Benchmarked on the dataset, our system achieves an F1-score of .78 to .95 in preference extraction, depending on category granularity. Our maintenance strategy reduces redundant preferences by 95% and contradictory ones by 92%, while the accuracy of optimal retrieval is at .87. Collectively, the results demonstrate the system's suitability for industrial applications.