🤖 AI Summary
This work addresses the insufficient modeling of user diversity in task-oriented dialogue systems by proposing the first end-to-end LLM-based user simulation framework. The framework unifies user persona generation, goal specification, multi-turn interaction simulation, and dialogue success evaluation. Leveraging GPT-4o and GPT-o1, it constructs synthetic user populations exhibiting heterogeneity across demographic attributes, knowledge levels, and conversational styles. Experimental results demonstrate that GPT-o1 significantly outperforms GPT-4o in balancing and diversifying user attribute distributions. The resulting user set achieves high evaluability and strong robustness-testing coverage, enabling more realistic and representative benchmarking of dialogue systems. This framework establishes a novel, diverse, and rigorously assessable user benchmark for comprehensive dialogue system evaluation.
📝 Abstract
In this study, we explore the application of Large Language Models (LLMs) for generating synthetic users and simulating user conversations with a task-oriented dialogue system and present detailed results and their analysis. We propose a comprehensive novel approach to user simulation technique that uses LLMs to create diverse user profiles, set goals, engage in multi-turn dialogues, and evaluate the conversation success. We employ two proprietary LLMs, namely GPT-4o and GPT-o1 (Achiam et al., 2023), to generate a heterogeneous base of user profiles, characterized by varied demographics, multiple user goals, different conversational styles, initial knowledge levels, interests, and conversational objectives. We perform a detailed analysis of the user profiles generated by LLMs to assess the diversity, consistency, and potential biases inherent in these LLM-generated user simulations. We find that GPT-o1 generates more heterogeneous user distribution across most user attributes, while GPT-4o generates more skewed user attributes. The generated set of user profiles are then utilized to simulate dialogue sessions by interacting with a task-oriented dialogue system.