A Data Synthesis Method Driven by Large Language Models for Proactive Mining of Implicit User Intentions in Tourism

📅 2025-05-14

📈 Citations: 0

✨ Influential: 0

career value

223K/year

🤖 AI Summary

User queries in tourism domains often exhibit high ambiguity and implicit intent, while existing LLMs lack capabilities for proactive clarification and deep intent elicitation. Method: We propose an LLM-driven dual-agent collaborative framework and construct SynPT-Dialog—a high-quality, Chinese tourism-oriented dialogue dataset—using seed data from Chinese travel websites. Contribution/Results: (1) The first domain-adaptive data synthesis framework tailored for tourism; (2) Explicit modeling of intent utility and user sentiment to mitigate initial query incompleteness and contextual redundancy; (3) Support for cross-lingual transfer between Chinese and English. Generated dialogues include explicit reasoning chains. After fine-tuning, the resulting model significantly outperforms baselines on implicit intent recognition and proactive question generation, validated via both human and LLM-based evaluation. Code and dataset are fully open-sourced.

Technology Category

Application Category

📝 Abstract

In the tourism domain, Large Language Models (LLMs) often struggle to mine implicit user intentions from tourists' ambiguous inquiries and lack the capacity to proactively guide users toward clarifying their needs. A critical bottleneck is the scarcity of high-quality training datasets that facilitate proactive questioning and implicit intention mining. While recent advances leverage LLM-driven data synthesis to generate such datasets and transfer specialized knowledge to downstream models, existing approaches suffer from several shortcomings: (1) lack of adaptation to the tourism domain, (2) skewed distributions of detail levels in initial inquiries, (3) contextual redundancy in the implicit intention mining module, and (4) lack of explicit thinking about tourists' emotions and intention values. Therefore, we propose SynPT (A Data Synthesis Method Driven by LLMs for Proactive Mining of Implicit User Intentions in the Tourism), which constructs an LLM-driven user agent and assistant agent to simulate dialogues based on seed data collected from Chinese tourism websites. This approach addresses the aforementioned limitations and generates SynPT-Dialog, a training dataset containing explicit reasoning. The dataset is utilized to fine-tune a general LLM, enabling it to proactively mine implicit user intentions. Experimental evaluations, conducted from both human and LLM perspectives, demonstrate the superiority of SynPT compared to existing methods. Furthermore, we analyze key hyperparameters and present case studies to illustrate the practical applicability of our method, including discussions on its adaptability to English-language scenarios. All code and data are publicly available.

Problem

Research questions and friction points this paper is trying to address.

LLMs struggle to mine implicit user intentions in tourism inquiries

Lack of high-quality datasets for proactive questioning and intention mining

Existing methods have domain adaptation and contextual redundancy issues

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-driven user and assistant agent simulation

Generates SynPT-Dialog with explicit reasoning

Fine-tunes LLM for proactive intention mining

🔎 Similar Papers

No similar papers found.