Parallel Heuristic Search as Inference for Actor-Critic Reinforcement Learning Models

📅 2025-09-29

📈 Citations: 0

✨ Influential: 0

career value

216K/year

🤖 AI Summary

Actor-Critic–based model-free reinforcement learning algorithms suffer from simplistic inference policies and limited decision quality in complex robotic tasks—such as collision-free motion planning and contact-rich, non-grasping pushing. To address this, we propose PACHS, a Parallel Heuristic Search inference framework. PACHS is the first to jointly leverage the Actor’s action generation and the Critic’s cost estimation within a parallel best-first search paradigm. It introduces a two-level parallelism: (i) batched action sampling and cost evaluation, and (ii) multi-threaded graph expansion—thereby significantly improving both search efficiency and path optimality. Experiments demonstrate that PACHS substantially outperforms standard Actor-Critic inference on challenging robotic manipulation tasks, yielding real-time decisions that are more robust, safer, and better adapted to dynamic physical interactions.

Technology Category

Application Category

📝 Abstract

Actor-Critic models are a class of model-free deep reinforcement learning (RL) algorithms that have demonstrated effectiveness across various robot learning tasks. While considerable research has focused on improving training stability and data sampling efficiency, most deployment strategies have remained relatively simplistic, typically relying on direct actor policy rollouts. In contrast, we propose pachs{} ( extit{P}arallel extit{A}ctor- extit{C}ritic extit{H}euristic extit{S}earch), an efficient parallel best-first search algorithm for inference that leverages both components of the actor-critic architecture: the actor network generates actions, while the critic network provides cost-to-go estimates to guide the search. Two levels of parallelism are employed within the search -- actions and cost-to-go estimates are generated in batches by the actor and critic networks respectively, and graph expansion is distributed across multiple threads. We demonstrate the effectiveness of our approach in robotic manipulation tasks, including collision-free motion planning and contact-rich interactions such as non-prehensile pushing. Visit p-achs.github.io for demonstrations and examples.

Problem

Research questions and friction points this paper is trying to address.

Improving actor-critic inference through parallel heuristic search

Enhancing robotic manipulation via efficient best-first planning

Leveraging critic estimates for guided action selection

Innovation

Methods, ideas, or system contributions that make the work stand out.

Parallel best-first search algorithm for inference

Leverages actor network actions and critic cost estimates

Employs two levels of parallelism in search

🔎 Similar Papers

No similar papers found.