Stochastic Prize-Collecting Games: Strategic Planning in Multi-Robot Systems

📅 2025-10-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses non-cooperative pathfinding for multi-robot systems under sparse rewards, energy constraints, and competitive environments. We propose the Stochastic Prize-Collecting Game (SPCG) framework—the first to formalize team-based pathfinding as a non-cooperative game. We theoretically prove that, under mild assumptions, pure Nash equilibria are equivalent to globally optimal path solutions. To enable scalable, distributed policy learning, we design Ordinal Ranking Search (ORS) and Fictitious Ordinal Response Learning (FORL), integrating state aliasing and best-response training. Experiments demonstrate that FORL exhibits strong generalization in large-scale systems, robustness to imbalanced reward distributions, and achieves 87%–95% of the TOP-optimal solution across diverse graph topologies. Our approach significantly enhances the practicality and scalability of multi-agent game-theoretic planning under sparse-reward settings.

Technology Category

Application Category

📝 Abstract
The Team Orienteering Problem (TOP) generalizes many real-world multi-robot scheduling and routing tasks that occur in autonomous mobility, aerial logistics, and surveillance applications. While many flavors of the TOP exist for planning in multi-robot systems, they assume that all the robots cooperate toward a single objective; thus, they do not extend to settings where the robots compete in reward-scarce environments. We propose Stochastic Prize-Collecting Games (SPCG) as an extension of the TOP to plan in the presence of self-interested robots operating on a graph, under energy constraints and stochastic transitions. A theoretical study on complete and star graphs establishes that there is a unique pure Nash equilibrium in SPCGs that coincides with the optimal routing solution of an equivalent TOP given a rank-based conflict resolution rule. This work proposes two algorithms: Ordinal Rank Search (ORS) to obtain the ''ordinal rank'' --one's effective rank in temporarily-formed local neighborhoods during the games' stages, and Fictitious Ordinal Response Learning (FORL) to obtain best-response policies against one's senior-rank opponents. Empirical evaluations conducted on road networks and synthetic graphs under both dynamic and stationary prize distributions show that 1) the state-aliasing induced by OR-conditioning enables learning policies that scale more efficiently to large team sizes than those trained with the global index, and 2) Policies trained with FORL generalize better to imbalanced prize distributions than those with other multi-agent training methods. Finally, the learned policies in the SPCG achieved between 87% and 95% optimality compared to an equivalent TOP solution obtained by mixed-integer linear programming.
Problem

Research questions and friction points this paper is trying to address.

Extends Team Orienteering Problem to competitive multi-robot systems with self-interested agents
Addresses strategic planning under energy constraints and stochastic transitions on graphs
Solves reward-scarce environments where robots compete rather than cooperate
Innovation

Methods, ideas, or system contributions that make the work stand out.

Extends Team Orienteering Problem to competitive multi-robot systems
Introduces Ordinal Rank Search for effective local ranking
Uses Fictitious Ordinal Response Learning for best-response policies
🔎 Similar Papers
No similar papers found.