A Framework for Generating Conversational Recommendation Datasets from Behavioral Interactions

📅 2025-06-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing recommender systems face a trade-off: collaborative filtering effectively models long-term user preferences but fails to capture real-time, interactive intent; conversational recommendation systems (CRS) address immediacy yet suffer from weak personalization due to the absence of collaborative signals. A key bottleneck in unifying these paradigms is the scarcity of large-scale, behavior-grounded conversational datasets. This paper proposes ConvRecStudio, the first LLM-driven, three-stage synthetic data generation framework: (1) fine-grained temporal modeling of user and community profiles, (2) semantic DAG-based dialogue planning, and (3) dual-agent fidelity-preserving dialogue simulation—enabling traceable, behavior-aligned CRS data generation at scale. It produces >12K high-quality multi-turn dialogues on MobileRec, Yelp, and Amazon Electronics. Human and automated evaluations confirm naturalness and behavioral consistency. Downstream CRS models achieve a 10.9% improvement in Hit@1.

Technology Category

Application Category

📝 Abstract
Modern recommendation systems typically follow two complementary paradigms: collaborative filtering, which models long-term user preferences from historical interactions, and conversational recommendation systems (CRS), which interact with users in natural language to uncover immediate needs. Each captures a different dimension of user intent. While CRS models lack collaborative signals, leading to generic or poorly personalized suggestions, traditional recommenders lack mechanisms to interactively elicit immediate needs. Unifying these paradigms promises richer personalization but remains challenging due to the lack of large-scale conversational datasets grounded in real user behavior. We present ConvRecStudio, a framework that uses large language models (LLMs) to simulate realistic, multi-turn dialogs grounded in timestamped user-item interactions and reviews. ConvRecStudio follows a three-stage pipeline: (1) Temporal Profiling, which constructs user profiles and community-level item sentiment trajectories over fine-grained aspects; (2) Semantic Dialog Planning, which generates a structured plan using a DAG of flexible super-nodes; and (3) Multi-Turn Simulation, which instantiates the plan using paired LLM agents for the user and system, constrained by executional and behavioral fidelity checks. We apply ConvRecStudio to three domains -- MobileRec, Yelp, and Amazon Electronics -- producing over 12K multi-turn dialogs per dataset. Human and automatic evaluations confirm the naturalness, coherence, and behavioral grounding of the generated conversations. To demonstrate utility, we build a cross-attention transformer model that jointly encodes user history and dialog context, achieving gains in Hit@K and NDCG@K over baselines using either signal alone or naive fusion. Notably, our model achieves a 10.9% improvement in Hit@1 on Yelp over the strongest baseline.
Problem

Research questions and friction points this paper is trying to address.

Lack of large-scale conversational datasets from real user behavior
Difficulty unifying collaborative filtering and conversational recommendation systems
Generic or poorly personalized suggestions in current CRS models
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-simulated multi-turn dialog generation
Temporal profiling for user-item sentiment
Cross-attention transformer for joint encoding
🔎 Similar Papers
No similar papers found.
V
Vinaik Chhetri
Louisiana State University, United States
Y
Yousaf Reza
Independent Researcher, Pakistan
Moghis Fereidouni
Moghis Fereidouni
University of Kentucky
Natural language processingReinforcement LearningMachine learning
S
Srijata Maji
University of Kentucky, United States
U
Umar Farooq
Louisiana State University, United States
A
A.B. Siddique
University of Kentucky, United States