🤖 AI Summary
This paper addresses the problem of achieving constant regret in dynamic two-sided matching under settings where real-time queue-length information is unavailable and only static (state-independent) policies are permitted. Focusing on finite agent-type spaces and stochastic arrival models—such as kidney exchange—we propose the first randomized greedy policy that attains the optimal $O(varepsilon^{-1})$ constant regret for arbitrary networks. Methodologically, we introduce the Generalized Positional Gap (GPG) to characterize structural properties of matchings and derive its first explicit regret upper bound. By integrating fluid relaxation analysis with deterministic priority-based policy design, we prove that deterministic policies achieve constant regret on acyclic networks and that our randomized policy tightly matches known lower bounds on general networks. The results provide a deployable, robust, and theoretically optimal decision foundation for safety-critical real-time matching systems.
📝 Abstract
We study a centralized discrete-time dynamic two-way matching model with finitely many agent types. Agents arrive stochastically over time and join their type-dedicated queues waiting to be matched. We focus on state-independent greedy policies that achieve constant regret at all times by making matching decisions based solely on agent availability across types, rather than requiring complete queue-length information. Such policies are particularly appealing for life-saving applications such as kidney exchange, as they require less information and provide more transparency compared to state-dependent policies. First, for acyclic matching networks, we analyze a deterministic priority policy proposed by Kerimov et al. [2023] that follows a static priority order over matches. We derive the first explicit regret bound in terms of the general position gap (GPG) parameter $epsilon$, which measures the distance of the fluid relaxation from degeneracy. Second, for general two-way matching networks, we design a randomized state-independent greedy policy that achieves constant regret with optimal scaling $O(epsilon^{-1})$, matching the existing lower bound established by Kerimov et al. [2024].