ChinaTravel: An Open-Ended Benchmark for Language Agents in Chinese Travel Planning

📅 2024-12-18

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Existing travel planning benchmarks suffer from misalignment with real user needs and inadequate evaluation of complex, multi-day, multi-destination itinerary generation. To address this, we introduce the first open-ended travel planning benchmark grounded in authentic requirements from 1,154 Chinese users. We propose a composable, domain-specific language (DSL) tailored for Chinese scenarios, enabling three-dimensional evaluation: feasibility verification, multi-constraint satisfaction, and preference alignment. Furthermore, we design a neuro-symbolic agent integrating realistic user demand modeling with a constraint-driven planning framework. Experiments show our method achieves a 37.0% constraint satisfaction rate—tenfold higher than pure neural baselines—significantly improving reliability and interpretability for complex real-world planning. Our key contributions are: (1) the first empirically grounded Chinese travel planning benchmark; (2) an extensible, semantics-rich DSL; and (3) a validated neuro-symbolic planning paradigm.

Technology Category

Application Category

📝 Abstract

Recent advances in LLMs, particularly in language reasoning and tool integration, have rapidly sparked the emph{Language Agents} for real-world development. Among these, travel planning represents a prominent domain, combining complex multi-objective planning challenges with practical deployment demands. However, existing benchmarks often oversimplify real-world requirements by focusing on synthetic queries and limited constraints. We address the gap of evaluating language agents in multi-day, multi-POI travel planning scenarios with diverse and open human needs. Specifically, we introduce emph{ChinaTravel}, the first open-ended benchmark grounded in authentic Chinese travel requirements collected from 1,154 human participants. We design a compositionally generalizable domain-specific language (DSL) for scalable evaluation, covering feasibility, constraint satisfaction, and preference comparison. Empirical studies reveal the potential of neuro-symbolic agents in travel planning, achieving a 37.0% constraint satisfaction rate on human queries, a 10 imes improvement over purely neural models. These findings highlight ChinaTravel as a pivotal milestone for advancing language agents in complex, real-world planning scenarios.

Problem

Research questions and friction points this paper is trying to address.

Evaluating language agents in multi-day, multi-POI travel planning

Addressing diverse and open human needs in travel planning

Developing a scalable benchmark for real-world Chinese travel requirements

Innovation

Methods, ideas, or system contributions that make the work stand out.

Open-ended benchmark for Chinese travel planning

Domain-specific language for scalable evaluation

Neuro-symbolic agents improve constraint satisfaction

🔎 Similar Papers

No similar papers found.

Authors to Follow