π€ AI Summary
This study investigates the long-term behavior of buyers employing online gradient ascent to learn bidding strategies in discretized first-price auctions under complete information. By constructing a potential-based analytical framework and introducing a novel class of cubic candidate potential functions, the work characterizes the no-regret property of quadratic strategy updates over the probability simplex and iteratively eliminates inefficient strategies in a time-averaged sense. The theoretical analysis demonstrates that this learning dynamic converges, in time average, to an allocation outcome closely approximating that of a second-price auction, thereby revealing that simple learning rules can spontaneously achieve near-socially optimal outcomes without explicit coordination or mechanism design.
π Abstract
We show that in discretised first-price auctions with complete information, if the buyers learn to bid with online gradient ascent, in time-average the outcome is (almost) the efficient outcome of the second-price auction. Our proof rests on two novel innovations in the analysis of online gradient ascent in normal-form games, which may be useful in a wider range of applications. First, we develop a potential-function-based argument for the analysis of gradient ascent in normal-form games, allowing us to deduce that certain strategies will not be played in time-average. We provide sufficient conditions which ensure this argument can be applied iteratively, resulting in a procedure reminiscent of iterative elimination of dominated strategies. Second, we develop a novel class of cubic "candidate potential functions", classifying a family of quadratic strategy modifications on the probability simplex against which online gradient ascent incurs no regret.