🤖 AI Summary
Real-world asynchronous tasks exhibit heterogeneous durations, concurrency, and runtime constraints, posing significant challenges for existing planning approaches. This work proposes a unified formal framework that leverages large language models to automatically encode such tasks into constraint satisfaction problems (CP-SAT) and incorporates a state-aware constraint repair mechanism to handle dynamic updates. The study introduces the first large-scale benchmark encompassing the three core challenges of asynchronous planning, revealing the critical impact of formal representation on scalability. Experimental results demonstrate that the proposed method achieves an accuracy of 83% with 100 actions—substantially outperforming direct planning (5%) and PDDL2.1 (0%). Moreover, when faced with runtime constraint changes, the repair strategy restores accuracy from 46.1% to 84.5%.
📝 Abstract
LLMs can plan by either generating action sequences directly as a Planner or translating tasks into domain specific language for an external solver as a Formalizer. While most real-world tasks are asynchronous with non-uniform durations, concurrency, and execution-time constraints, existing benchmarks hardly cover them. We unify these asynchronous planning challenges under a single formulation and introduce the first three benchmarks that address each at scale. We conclude that the choice of formal representation primarily determines whether planning scales: as dependency graphs grow from 5 to 100 actions, Planner collapses from 96% to 5% plan accuracy and PDDL2.1 Formalizer from 13% to 0%, while CP-SAT Formalizer averages 94% and still achieves 83% at 100 actions. Faithfulness diagnostics show that PDDL2.1's predicate-based planning representation becomes brittle compared to general constraint satisfaction programs, when LLMs must keep predicates, effects, and goals consistent. Execution-time updates of planning constraints further degrade performance sharply (Planner 23.9%, PDDL2.1 0.7%, CP-SAT 46.1%), but a state-aware repair strategy that updates only event-induced constraints recovers CP-SAT Formalizer to 84.5%.