🤖 AI Summary
This work addresses the dynamic interaction between a learning agent and a strategic optimizer in repeated second-price auctions under budget constraints within a Bayesian setting. It introduces the notion of a "Budgeted Stackelberg Equilibrium" and integrates tools from game theory, control theory—specifically proportional controllers—and dynamical systems analysis to uncover how cross-round budget limitations fundamentally reshape the game’s structure. The study characterizes the optimizer’s optimal strategy as one that requires switching among at most \(k+1\) mixed strategies in a piecewise manner across rounds. Furthermore, it establishes—for the first time—that a learner employing a standard proportional controller exhibits strategic robustness, effectively capping the optimizer’s utility and achieving the equilibrium’s theoretical upper bound.
📝 Abstract
The study of repeated interactions between a learner and a utility-maximizing optimizer has yielded deep insights into the manipulability of learning algorithms. However, existing literature primarily focuses on independent, unlinked rounds, largely ignoring the ubiquitous practical reality of budget constraints. In this paper, we study this interaction in repeated second-price auctions in a Bayesian setting between a learning agent and a strategic agent, both subject to strict budget constraints, showing that such cross-round constraints fundamentally alter the strategic landscape.
First, we generalize the classic Stackelberg equilibrium to the Budgeted Stackelberg Equilibrium. We prove that an optimizer's optimal strategy in a budgeted setting requires time-multiplexing; for a $k$-dimensional budget constraint, the optimal strategy strictly decomposes into up to $k+1$ distinct phases, with each phase employing a possibly unique mixed strategy (the case of $k=0$ recovers the classic Stackelberg equilibrium where the optimizer repeatedly uses a single mixed strategy). Second, we address the intriguing question of non-manipulability. We prove that when the learner employs a standard Proportional controller (the "P" of the PID-controller) to pace their bids, the optimizer's utility is upper bounded by their objective value in the Budgeted Stackelberg Equilibrium baseline. By bounding the dynamics of the PID controller via a novel analysis, our results establish that this widely used control-theoretic heuristic is actually strategically robust.