Heavy-traffic Optimality of Skip-the-Longest-Queues in Heterogeneous Parallel Service Systems

📅 2025-03-04

📈 Citations: 0

✨ Influential: 0

career value

305K/year

🤖 AI Summary

In heterogeneous parallel service systems under high load, frequent state feedback incurs substantial communication overhead, conflicting with low-latency objectives. Method: We propose a lightweight scheduling policy, *k*-SLQ-*d*, which samples queue lengths only periodically (every *k* time units) and dispatches each task to one of the *d* shortest among *n* queues. Contribution/Results: We establish, for the first time, rigorous theoretical guarantees for arbitrary large sampling intervals *k*: (i) throughput optimality under *d = o(n)*; (ii) asymptotic delay optimality in the heavy-traffic regime; and (iii) average per-task communication cost reduced to *O*(1/*n*). This breaks the conventional reliance on real-time, full-state feedback in load balancing, providing both theoretical foundations and practical design principles for distributed scheduling in high-concurrency, bandwidth-constrained environments.

Technology Category

Application Category

📝 Abstract

We consider a discrete-time parallel service system consisting of $n$ heterogeneous single server queues with infinite capacity. Jobs arrive to the system as an i.i.d. process with rate proportional to $n$, and must be immediately dispatched in the time slot that they arrive. The dispatcher is assumed to be able to exchange messages with the servers to obtain their queue lengths and make dispatching decisions, introducing an undesirable communication overhead. In this setting, we propose a ultra-low communication overhead load balancing policy dubbed $k$-Skip-the-$d$-Longest-Queues ($k$-SLQ-$d$), where queue lengths are only observed every $k(n-d)$ time slots and, between observations, incoming jobs are sent to a queue that is not one of the $d$ longest ones at the time that the queues were last observed. For this policy, we establish conditions on $d$ for it to be throughput optimal and we show that, under that condition, it is asymptotically delay-optimal in heavy-traffic for arbitrarily low communication overheads (i.e., for arbitrarily large $k$).

Problem

Research questions and friction points this paper is trying to address.

Optimize load balancing in heterogeneous parallel service systems

Minimize communication overhead in job dispatching decisions

Achieve throughput and delay optimality under heavy traffic

Innovation

Methods, ideas, or system contributions that make the work stand out.

Ultra-low communication overhead policy

Skip-the-Longest-Queues load balancing

Asymptotically delay-optimal in heavy-traffic

🔎 Similar Papers

Queue Management for SLO-Oriented Large Language Model Serving