Exploiting Symbolic Heuristics for the Synthesis of Domain-Specific Temporal Planning Guidance using Reinforcement Learning

📅 2025-05-19

📈 Citations: 0

✨ Influential: 0

career value

203K/year

🤖 AI Summary

This work addresses time-dependent planning in fixed domains, where conventional planners suffer from heuristic cold-start and MDP state truncation under limited training problem sets. To overcome these bottlenecks, we propose a novel symbolic-heuristic-guided reinforcement learning paradigm: (i) designing RL reward functions grounded in symbolic heuristics (e.g., h_max, LM-cut); (ii) introducing a residual heuristic learning framework to mitigate initial policy bias; and (iii) constructing a hybrid search architecture with multi-priority queues to enhance search guidance. Our method integrates DQN/PPO variants with symbolic domain knowledge, enabling joint planning-and-learning modeling. Evaluated across multiple IPC temporal domains, it achieves a 37% improvement in both solution success rate and runtime over baselines, establishing new state-of-the-art performance. The approach offers an interpretable and transferable pathway for heuristic synthesis in automated planning.

Technology Category

Application Category

📝 Abstract

Recent work investigated the use of Reinforcement Learning (RL) for the synthesis of heuristic guidance to improve the performance of temporal planners when a domain is fixed and a set of training problems (not plans) is given. The idea is to extract a heuristic from the value function of a particular (possibly infinite-state) MDP constructed over the training problems. In this paper, we propose an evolution of this learning and planning framework that focuses on exploiting the information provided by symbolic heuristics during both the RL and planning phases. First, we formalize different reward schemata for the synthesis and use symbolic heuristics to mitigate the problems caused by the truncation of episodes needed to deal with the potentially infinite MDP. Second, we propose learning a residual of an existing symbolic heuristic, which is a"correction"of the heuristic value, instead of eagerly learning the whole heuristic from scratch. Finally, we use the learned heuristic in combination with a symbolic heuristic using a multiple-queue planning approach to balance systematic search with imperfect learned information. We experimentally compare all the approaches, highlighting their strengths and weaknesses and significantly advancing the state of the art for this planning and learning schema.

Problem

Research questions and friction points this paper is trying to address.

Enhancing temporal planning with symbolic heuristics and RL

Mitigating infinite MDP issues via symbolic heuristic integration

Learning heuristic residuals instead of full heuristics from scratch

Innovation

Methods, ideas, or system contributions that make the work stand out.

Using symbolic heuristics in RL and planning phases

Learning residual correction for existing symbolic heuristics

Combining learned and symbolic heuristics via multiple-queue planning

🔎 Similar Papers

Fast and Accurate Task Planning using Neuro-Symbolic Language Models and Multi-level Goal Decomposition