ChinaTravel: An Open-Ended Benchmark for Language Agents in Chinese Travel Planning

📅 2024-12-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing travel planning benchmarks suffer from misalignment with real user needs and inadequate evaluation of complex, multi-day, multi-destination itinerary generation. To address this, we introduce the first open-ended travel planning benchmark grounded in authentic requirements from 1,154 Chinese users. We propose a composable, domain-specific language (DSL) tailored for Chinese scenarios, enabling three-dimensional evaluation: feasibility verification, multi-constraint satisfaction, and preference alignment. Furthermore, we design a neuro-symbolic agent integrating realistic user demand modeling with a constraint-driven planning framework. Experiments show our method achieves a 37.0% constraint satisfaction rate—tenfold higher than pure neural baselines—significantly improving reliability and interpretability for complex real-world planning. Our key contributions are: (1) the first empirically grounded Chinese travel planning benchmark; (2) an extensible, semantics-rich DSL; and (3) a validated neuro-symbolic planning paradigm.

Technology Category

Application Category

📝 Abstract
Recent advances in LLMs, particularly in language reasoning and tool integration, have rapidly sparked the emph{Language Agents} for real-world development. Among these, travel planning represents a prominent domain, combining complex multi-objective planning challenges with practical deployment demands. However, existing benchmarks often oversimplify real-world requirements by focusing on synthetic queries and limited constraints. We address the gap of evaluating language agents in multi-day, multi-POI travel planning scenarios with diverse and open human needs. Specifically, we introduce emph{ChinaTravel}, the first open-ended benchmark grounded in authentic Chinese travel requirements collected from 1,154 human participants. We design a compositionally generalizable domain-specific language (DSL) for scalable evaluation, covering feasibility, constraint satisfaction, and preference comparison. Empirical studies reveal the potential of neuro-symbolic agents in travel planning, achieving a 37.0% constraint satisfaction rate on human queries, a 10 imes improvement over purely neural models. These findings highlight ChinaTravel as a pivotal milestone for advancing language agents in complex, real-world planning scenarios.
Problem

Research questions and friction points this paper is trying to address.

Evaluating language agents in multi-day, multi-POI travel planning
Addressing diverse and open human needs in travel planning
Developing a scalable benchmark for real-world Chinese travel requirements
Innovation

Methods, ideas, or system contributions that make the work stand out.

Open-ended benchmark for Chinese travel planning
Domain-specific language for scalable evaluation
Neuro-symbolic agents improve constraint satisfaction
🔎 Similar Papers
No similar papers found.
Jie-Jing Shao
Jie-Jing Shao
Nanjing University
Machine LearningNeuro-Symbolic LearningReinforcement Learning
Xiao-Wen Yang
Xiao-Wen Yang
PHD student, Nanjing University
neural-symbolic learningweak-supervised learninglarge language model
B
Bo-Wen Zhang
National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China; School of Intelligence Science and Technology, Nanjing University, Nanjing, China
B
Baizhi Chen
National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China
W
Wen-Da Wei
National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China; School of Artificial Intelligence, Nanjing University, Nanjing, China
Lan-Zhe Guo
Lan-Zhe Guo
LAMDA Group, Nanjing University
Machine Learning
Y
Yu-feng Li
National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China; School of Artificial Intelligence, Nanjing University, Nanjing, China