Optimal and Stable Distributed Bipartite Load Balancing

📅 2024-11-26

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This paper studies bipartite load balancing in heterogeneous backend–distributed frontend systems: frontends are communication-free and route tasks autonomously based solely on local service-rate observations; task assignment is constrained by an arbitrary bipartite compatibility graph; the objective is to minimize average task delay. We propose an edge-based service-rate routing mechanism that requires no prior knowledge of arrival rates. It is the first such scheme achieving global optimal convergence under distributed control, with an explicit convergence rate bound of $O(delta + log(1/varepsilon))$. Through fluid-limit analysis, stochastic-process convergence proofs, and scaling arguments under Poisson arrivals and small-job asymptotics, we establish that the mechanism converges strongly asymptotically to the centralized optimum at the fluid scale, converges almost surely in the discrete stochastic system, and exhibits robust stability with logarithmic-speed correction of initial deviations.

Technology Category

Application Category

📝 Abstract

We study distributed load balancing in bipartite queueing systems. Specifically, a set of frontends route jobs to a set of heterogeneous backends with workload-dependent service rates, with an arbitrary bipartite graph representing the connectivity between the frontends and backends. Each frontend operates independently without any communication with the other frontends, and the goal is to minimize the expectation of the sum of the latencies of all jobs. Routing based on expected latency can lead to arbitrarily poor performance compared to the centrally coordinated optimal routing. To address this, we propose a natural alternative approach that routes jobs based on marginal service rates, which does not need to know the arrival rates. Despite the distributed nature of this algorithm, it achieves effective coordination among the frontends. In a model with independent Poisson arrivals of discrete jobs at each frontend, we show that the behavior of our routing policy converges (almost surely) to the behavior of a fluid model, in the limit as job sizes tend to zero and Poisson arrival rates are scaled at each frontend so that the expected total volume of jobs arriving per unit time remains fixed. Then, in the fluid model, where job arrivals are represented by infinitely divisible continuous flows and service times are deterministic, we demonstrate that the system converges globally and strongly asymptotically to the centrally coordinated optimal routing. Moreover, we prove the following guarantee on the convergence rate: if initial workloads are $delta$-suboptimal, it takes ${O}( delta + log{1/epsilon})$ time to obtain an $epsilon$-suboptimal solution.

Problem

Research questions and friction points this paper is trying to address.

Distributed load balancing in heterogeneous bipartite queueing systems

Minimizing expected average latency without arrival rate knowledge

Achieving optimal routing with workload-dependent service rates

Innovation

Methods, ideas, or system contributions that make the work stand out.

Distributed load balancing with workload-dependent service rates

Greatest Marginal Service Rate policy for coordination

Lexicographically maximizes throughput and minimizes workload

🔎 Similar Papers

No similar papers found.

Authors to Follow