A* Search Without Expansions: Learning Heuristic Functions with Deep Q-Networks

📅 2021-02-08

🏛️ arXiv.org

📈 Citations: 19

✨ Influential: 3

career value

214K/year

🤖 AI Summary

To address the computational bottleneck in A* search—where node generation and heuristic evaluation scale linearly with action space size—this paper proposes Q* search: the first integration of a deep Q-network (DQN) into an optimal pathfinding framework. Q* replaces explicit child-node expansion by directly predicting, via a single forward pass, the sum of cost-to-come and heuristic estimate for all candidate actions, thereby drastically reducing computational and memory overhead. We theoretically prove that Q* retains solution optimality under weak admissibility conditions. Evaluated on a Rubik’s Cube meta-action modeling task with an 1872-dimensional action space, Q* incurs less than 4× runtime overhead and generates fewer than 3× nodes compared to A*, achieving up to 129× speedup and 1288× reduction in node expansions. The core contribution lies in repurposing DQN from policy learning to serving as the evaluation kernel in heuristic search, enabling efficient optimal planning in massive action spaces.

📝 Abstract

Efficiently solving problems with large action spaces using A* search has been of importance to the artificial intelligence community for decades. This is because the computation and memory requirements of A* search grow linearly with the size of the action space. This burden becomes even more apparent when A* search uses a heuristic function learned by computationally expensive function approximators, such as deep neural networks. To address this problem, we introduce Q* search, a search algorithm that uses deep Q-networks to guide search in order to take advantage of the fact that the sum of the transition costs and heuristic values of the children of a node can be computed with a single forward pass through a deep Q-network without explicitly generating those children. This significantly reduces computation time and requires only one node to be generated per iteration. We use Q* search to solve the Rubik's cube when formulated with a large action space that includes 1872 meta-actions and find that this 157-fold increase in the size of the action space incurs less than a 4-fold increase in computation time and less than a 3-fold increase in number of nodes generated when performing Q* search. Furthermore, Q* search is up to 129 times faster and generates up to 1288 times fewer nodes than A* search. Finally, although obtaining admissible heuristic functions from deep neural networks is an ongoing area of research, we prove that Q* search is guaranteed to find a shortest path given a heuristic function that neither overestimates the cost of a shortest path nor underestimates the transition cost.

Problem

Research questions and friction points this paper is trying to address.

Reducing computational burden in A* search with large action spaces

Learning heuristic functions without generating successor states

Developing efficient pathfinding using deep Q-networks for cost estimation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses deep Q-networks to learn heuristic functions

Estimates Q-values without generating successor states

Reduces computation time and memory usage significantly

🔎 Similar Papers

No similar papers found.

Bosch Group

Renningen, BW, DE

Research Scientist, Gemini Retrieval and Agera, DeepMind

Google

$207,000-$300,000 + bonus + equity + benefits

Mountain View, CA, USA

AI Research Scientist, Robotics