Nearly-Optimal Bandit Learning in Stackelberg Games with Side Information

📅 2025-01-31

📈 Citations: 0

✨ Influential: 0

career value

264K/year

🤖 AI Summary

This paper studies online learning for the leader in sequential Stackelberg games under limited feedback: in each round, the leader selects a mixed strategy based on contextual information, while the follower best-responds according to Bayesian rationality; the goal is to minimize cumulative regret. Methodologically, we reduce Stackelberg learning to linear contextual bandits—by modeling the problem in utility space and integrating inverse optimization with mixed-strategy construction. This yields a near-optimal regret bound of $O(sqrt{T})$, breaking the previous $O(T^{2/3})$ barrier. Our approach operates without prior knowledge of the utility function and significantly outperforms existing algorithms in second-price auctions and Bayesian persuasion tasks. Moreover, it supports online persuasion under both public and private state settings, demonstrating strong generalizability and practical applicability.

Technology Category

Application Category

📝 Abstract

We study the problem of online learning in Stackelberg games with side information between a leader and a sequence of followers. In every round the leader observes contextual information and commits to a mixed strategy, after which the follower best-responds. We provide learning algorithms for the leader which achieve $O(T^{1/2})$ regret under bandit feedback, an improvement from the previously best-known rates of $O(T^{2/3})$. Our algorithms rely on a reduction to linear contextual bandits in the utility space: In each round, a linear contextual bandit algorithm recommends a utility vector, which our algorithm inverts to determine the leader's mixed strategy. We extend our algorithms to the setting in which the leader's utility function is unknown, and also apply it to the problems of bidding in second-price auctions with side information and online Bayesian persuasion with public and private states. Finally, we observe that our algorithms empirically outperform previous results on numerical simulations.

Problem

Research questions and friction points this paper is trying to address.

Stackelberg Games

Learning Strategies Optimization

Regret Minimization

Innovation

Methods, ideas, or system contributions that make the work stand out.

Stackelberg Game

Learning Methodology

Optimized Strategy Learning

🔎 Similar Papers

No similar papers found.