🤖 AI Summary
Large language models (LLMs) applied to PDDL planning rely heavily on manually crafted natural-language prompts, hindering scalability and reproducibility of evaluation. Method: We propose the first fully automated PDDL→NL prompt generation framework, integrating PDDL syntactic parsing, semantic alignment, and LLM-based meta-prompt optimization to enable end-to-end translation from formal domain descriptions into high-quality instructional prompts. Contribution/Results: Experiments demonstrate that our automatically generated prompts achieve action-selection performance on par with human-written prompts across multiple canonical PDDL domains. Moreover, they enable the largest and most comprehensive LLM-PDDL planning benchmark to date—systematically evaluating generalization capabilities and exposing fundamental limitations of current LLMs in symbolic planning tasks. This framework eliminates manual prompt engineering, enhances experimental rigor, and facilitates large-scale, reproducible assessment of LLMs on formal planning problems.
📝 Abstract
Large language models (LLMs) have revolutionized a large variety of NLP tasks. An active debate is to what extent they can do reasoning and planning. Prior work has assessed the latter in the specific context of PDDL planning, based on manually converting three PDDL domains into natural language (NL) prompts. Here we automate this conversion step, showing how to leverage an LLM to automatically generate NL prompts from PDDL input. Our automatically generated NL prompts result in similar LLM-planning performance as the previous manually generated ones. Beyond this, the automation enables us to run much larger experiments, providing for the first time a broad evaluation of LLM planning performance in PDDL.