On Dynamic Programming Theory for Leader-Follower Stochastic Games

📅 2025-12-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the sequential Stackelberg decision problem in leader-follower general-sum stochastic games (LF-GSSGs), aiming to efficiently compute strong Stackelberg equilibria (SSEs). To overcome the challenge that existing methods struggle to simultaneously ensure theoretical rigor and computational scalability, we establish the first formal result showing that LF-GSSGs can be losslessly reduced to state-abstraction Markov decision processes grounded in a “credible set”—a state-dependent collection of follower rational responses. Leveraging this reduction, we propose a novel dynamic programming framework and design a Bellman recursion algorithm with ε-optimality guarantees. Our key contribution lies in explicitly modeling follower rationality as a state-dependent credible policy set, thereby enabling a compact characterization of asymmetric commitment structures. Experiments on security games and mixed-motive resource allocation benchmarks demonstrate substantial improvements in both leader utility and computational efficiency over state-of-the-art algorithms.

Technology Category

Application Category

📝 Abstract
Leader-follower general-sum stochastic games (LF-GSSGs) model sequential decision-making under asymmetric commitment, where a leader commits to a policy and a follower best responds, yielding a strong Stackelberg equilibrium (SSE) with leader-favourable tie-breaking. This paper introduces a dynamic programming (DP) framework that applies Bellman recursion over credible sets-state abstractions formally representing all rational follower best responses under partial leader commitments-to compute SSEs. We first prove that any LF-GSSG admits a lossless reduction to a Markov decision process (MDP) over credible sets. We further establish that synthesising an optimal memoryless deterministic leader policy is NP-hard, motivating the development of ε-optimal DP algorithms with provable guarantees on leader exploitability. Experiments on standard mixed-motive benchmarks-including security games, resource allocation, and adversarial planning-demonstrate empirical gains in leader value and runtime scalability over state-of-the-art methods.
Problem

Research questions and friction points this paper is trying to address.

Computes strong Stackelberg equilibrium in leader-follower stochastic games
Reduces leader-follower games to Markov decision processes via credible sets
Develops efficient algorithms for optimal leader policies with performance guarantees
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic programming over credible sets for Stackelberg equilibrium
Reduction to Markov decision process enabling Bellman recursion
Epsilon-optimal algorithms with provable exploitability guarantees
🔎 Similar Papers
No similar papers found.
J
Jilles Steeve Dibangoye
University of Groningen
T
Thibaut Le Marre
ENS de Lyon, CNRS, University Claude Bernard Lyon 1, Inria, LIP, UMR 5668, France
Ocan Sankur
Ocan Sankur
CNRS, Université de Rennes
Formal methods
François Schwarzentruber
François Schwarzentruber
École Normale Supérieure de Lyon
logicmodal logicartificial intelligencemulti-agent systemsformal methods