🤖 AI Summary
This paper studies the repeated two-sided matching problem under bidirectionally unknown preferences: how to achieve stable matchings via sequential interactions without explicit communication or prior preference knowledge. To address the challenge that both proposers and proposees must learn their preferences online, we propose the first two-stage optimistic belief updating mechanism, integrating a multi-armed bandit framework, Bayesian preference estimation, and conditional acceptance probability inference. We rigorously prove that the algorithm converges almost surely to a stable matching. This result breaks the classical Gale–Shapley algorithm’s reliance on complete, static preference information and marks the first extension of stable matching theory to fully online learning settings where preferences are entirely unknown to both sides. The approach significantly enhances applicability and robustness in dynamic, information-limited markets.
📝 Abstract
We study the problem of repeated two-sided matching with uncertain preferences (two-sided bandits), and no explicit communication between agents. Recent work has developed algorithms that converge to stable matchings when one side (the proposers or agents) must learn their preferences, but the preferences of the other side (the proposees or arms) are common knowledge, and the matching mechanism uses simultaneous proposals at each round. We develop new algorithms that provably converge to stable matchings for two more challenging settings: one where the arm preferences are no longer common knowledge, and a second, more general one where the arms are also uncertain about their preferences. In our algorithms, agents start with optimistic beliefs about arms' preferences and update these preferences over time. The key insight is in how to combine these beliefs about arm preferences with beliefs about the value of matching with an arm conditional on one's proposal being accepted when choosing whom to propose to.