Optimal moments on redundancies in job cloning

📅 2024-02-19

📈 Citations: 0

✨ Influential: 0

career value

251K/year

🤖 AI Summary

In distributed task allocation, worker node failures cause low and highly volatile result collection rates at the master node. Method: This paper investigates optimal moment properties under redundancy-aware scheduling, aiming to guarantee—via minimal redundancy—that the master node receives a sufficient number of independent task results with high probability. We first prove that all balanced allocation schemes yield identical expected reception counts; then, leveraging generalized balanced incomplete block designs (GBIBDs), we construct an allocation strategy that minimizes the variance of received results, overcoming the high-variance limitation of conventional repetition coding. The approach integrates combinatorial design theory with probabilistic analysis. Contribution/Results: Our method significantly reduces the variance in the number of successfully received results—by an order of magnitude compared to repetition coding—thereby substantially enhancing the robustness and stability of task completion in failure-prone distributed environments.

Technology Category

Application Category

📝 Abstract

We consider the problem of job assignment where a master server aims to compute some tasks and is provided a few child servers to compute under a uniform straggling pattern where each server is equally likely to straggle. We distribute tasks to the servers so that the master is able to receive most of the tasks even if a significant number of child servers fail to communicate. We first show that all extit{balanced} assignment schemes have the same expectation on the number of distinct tasks received and then study the variance. We show constructions using a generalization of ``Balanced Incomplete Block Design''cite{doi:10.1111/j.1469-1809.1939.tb02219.x,sprott1955} minimizes the variance, and constructions based on repetition coding schemes attain the largest variance.

Problem

Research questions and friction points this paper is trying to address.

Distributed Computing

Fault Tolerance

Workload Optimization

Innovation

Methods, ideas, or system contributions that make the work stand out.

Optimized Workload Allocation

Fault Tolerance

Efficiency Improvement

🔎 Similar Papers

Balanced assignments of periodic tasks