🤖 AI Summary
This paper studies the problem of online pricing in incomplete-information markets, where an operator must dynamically learn equilibrium prices over $T$ periods to match supply and demand without knowledge of suppliers’ private cost functions. We propose the first online pricing framework jointly optimizing three regret objectives: unsatisfied demand, cost regret, and payment regret. Our method models strategic interactions via game-theoretic equilibrium and designs an adaptive price-update learning algorithm grounded in this equilibrium structure. Theoretically, when supplier cost functions are fixed, our algorithm achieves an asymptotically optimal regret bound of $O(log log T)$ for both constant and slowly varying demand—significantly improving upon the $Omega(sqrt{T})$ lower bound inherent to existing online pricing approaches. This result constitutes the first reduction of regret complexity from polynomial to double-logarithmic order, establishing a new paradigm for high-precision dynamic market mechanism design.
📝 Abstract
The study of market equilibria is central to economic theory, particularly in efficiently allocating scarce resources. However, the computation of equilibrium prices at which the supply of goods matches their demand typically relies on having access to complete information on private attributes of agents, e.g., suppliers' cost functions, which are often unavailable in practice. Motivated by this practical consideration, we consider the problem of setting equilibrium prices in the incomplete information setting wherein a market operator seeks to satisfy the customer demand for a commodity by purchasing the required amount from competing suppliers with privately known cost functions unknown to the market operator. In this incomplete information setting, we consider the online learning problem of learning equilibrium prices over time while jointly optimizing three performance metrics-unmet demand, cost regret, and payment regret-pertinent in the context of equilibrium pricing over a horizon of $T$ periods. In the general setting when suppliers' cost functions are time-varying, we show that no online algorithm can achieve sublinear regret on all three metrics. Thus, we consider the setting when suppliers' cost functions are fixed and develop algorithms that achieve a regret of (i) O(log log T) when the customer demand is constant over time and (ii) $O$(log log T) when the demand is variable over time.