Simulating User Diversity in Task-Oriented Dialogue Systems using Large Language Models

📅 2025-02-18

📈 Citations: 0

✨ Influential: 0

career value

219K/year

🤖 AI Summary

This work addresses the insufficient modeling of user diversity in task-oriented dialogue systems by proposing the first end-to-end LLM-based user simulation framework. The framework unifies user persona generation, goal specification, multi-turn interaction simulation, and dialogue success evaluation. Leveraging GPT-4o and GPT-o1, it constructs synthetic user populations exhibiting heterogeneity across demographic attributes, knowledge levels, and conversational styles. Experimental results demonstrate that GPT-o1 significantly outperforms GPT-4o in balancing and diversifying user attribute distributions. The resulting user set achieves high evaluability and strong robustness-testing coverage, enabling more realistic and representative benchmarking of dialogue systems. This framework establishes a novel, diverse, and rigorously assessable user benchmark for comprehensive dialogue system evaluation.

Technology Category

Application Category

📝 Abstract

In this study, we explore the application of Large Language Models (LLMs) for generating synthetic users and simulating user conversations with a task-oriented dialogue system and present detailed results and their analysis. We propose a comprehensive novel approach to user simulation technique that uses LLMs to create diverse user profiles, set goals, engage in multi-turn dialogues, and evaluate the conversation success. We employ two proprietary LLMs, namely GPT-4o and GPT-o1 (Achiam et al., 2023), to generate a heterogeneous base of user profiles, characterized by varied demographics, multiple user goals, different conversational styles, initial knowledge levels, interests, and conversational objectives. We perform a detailed analysis of the user profiles generated by LLMs to assess the diversity, consistency, and potential biases inherent in these LLM-generated user simulations. We find that GPT-o1 generates more heterogeneous user distribution across most user attributes, while GPT-4o generates more skewed user attributes. The generated set of user profiles are then utilized to simulate dialogue sessions by interacting with a task-oriented dialogue system.

Problem

Research questions and friction points this paper is trying to address.

Simulating diverse user conversations

Generating synthetic user profiles

Evaluating dialogue system effectiveness

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLMs simulate diverse user profiles

Multi-turn dialogues with task-oriented systems

Assess diversity and biases in user simulations

🔎 Similar Papers

No similar papers found.