Optimal and Stable Distributed Bipartite Load Balancing

📅 2024-11-26
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper studies bipartite load balancing in heterogeneous backend–distributed frontend systems: frontends are communication-free and route tasks autonomously based solely on local service-rate observations; task assignment is constrained by an arbitrary bipartite compatibility graph; the objective is to minimize average task delay. We propose an edge-based service-rate routing mechanism that requires no prior knowledge of arrival rates. It is the first such scheme achieving global optimal convergence under distributed control, with an explicit convergence rate bound of $O(delta + log(1/varepsilon))$. Through fluid-limit analysis, stochastic-process convergence proofs, and scaling arguments under Poisson arrivals and small-job asymptotics, we establish that the mechanism converges strongly asymptotically to the centralized optimum at the fluid scale, converges almost surely in the discrete stochastic system, and exhibits robust stability with logarithmic-speed correction of initial deviations.

Technology Category

Application Category

📝 Abstract
We study distributed load balancing in bipartite queueing systems. Specifically, a set of frontends route jobs to a set of heterogeneous backends with workload-dependent service rates, with an arbitrary bipartite graph representing the connectivity between the frontends and backends. Each frontend operates independently without any communication with the other frontends, and the goal is to minimize the expectation of the sum of the latencies of all jobs. Routing based on expected latency can lead to arbitrarily poor performance compared to the centrally coordinated optimal routing. To address this, we propose a natural alternative approach that routes jobs based on marginal service rates, which does not need to know the arrival rates. Despite the distributed nature of this algorithm, it achieves effective coordination among the frontends. In a model with independent Poisson arrivals of discrete jobs at each frontend, we show that the behavior of our routing policy converges (almost surely) to the behavior of a fluid model, in the limit as job sizes tend to zero and Poisson arrival rates are scaled at each frontend so that the expected total volume of jobs arriving per unit time remains fixed. Then, in the fluid model, where job arrivals are represented by infinitely divisible continuous flows and service times are deterministic, we demonstrate that the system converges globally and strongly asymptotically to the centrally coordinated optimal routing. Moreover, we prove the following guarantee on the convergence rate: if initial workloads are $delta$-suboptimal, it takes ${O}( delta + log{1/epsilon})$ time to obtain an $epsilon$-suboptimal solution.
Problem

Research questions and friction points this paper is trying to address.

Distributed load balancing in heterogeneous bipartite queueing systems
Minimizing expected average latency without arrival rate knowledge
Achieving optimal routing with workload-dependent service rates
Innovation

Methods, ideas, or system contributions that make the work stand out.

Distributed load balancing with workload-dependent service rates
Greatest Marginal Service Rate policy for coordination
Lexicographically maximizes throughput and minimizes workload
🔎 Similar Papers
No similar papers found.
W
Wenxin Zhang
Graduate School of Business, Columbia University
S
S. Balseiro
Graduate School of Business, Columbia University
Robert Kleinberg
Robert Kleinberg
Department of Computer Science, Cornell University
Theory of ComputingAlgorithmsAlgorithmic Game Theory and EconomicsLearning Theory
V
V. Mirrokni
Google Research
Balasubramanian Sivan
Balasubramanian Sivan
Google Research, New York
Algorithmic Game Theory and Approximation algorithms
B
B. Wydrowski
Google Research