Fully Distributed Fog Load Balancing with Multi-Agent Reinforcement Learning

📅 2024-05-15

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

242K/year

🤖 AI Summary

In fog computing, dynamic and unpredictable IoT traffic, coupled with heterogeneous fog nodes, leads to severe load imbalance and high task waiting latency. Method: This paper proposes a fully decentralized multi-agent reinforcement learning (MARL) framework that integrates lifelong adaptive transfer learning with an interval-based Gossip broadcasting protocol, explicitly modeling realistic observation delays to capture the trade-off between system constraints and performance. Local decision-making is performed within collaborative domains, eliminating reliance on centralized coordination. Contribution/Results: Experiments demonstrate that the proposed approach reduces end-to-end latency by 32% and task waiting time by 41% compared to centralized single-agent and state-of-the-art baseline methods. Moreover, it enables scalable, region-level autonomous deployment—supporting ultra-large-scale fog networks without central supervision.

Technology Category

Application Category

📝 Abstract

Real-time Internet of Things (IoT) applications require real-time support to handle the ever-growing demand for computing resources to process IoT workloads. Fog Computing provides high availability of such resources in a distributed manner. However, these resources must be efficiently managed to distribute unpredictable traffic demands among heterogeneous Fog resources. This paper proposes a fully distributed load-balancing solution with Multi-Agent Reinforcement Learning (MARL) that intelligently distributes IoT workloads to optimize the waiting time while providing fair resource utilization in the Fog network. These agents use transfer learning for life-long self-adaptation to dynamic changes in the environment. By leveraging distributed decision-making, MARL agents effectively minimize the waiting time compared to a single centralized agent solution and other baselines, enhancing end-to-end execution delay. Besides performance gain, a fully distributed solution allows for a global-scale implementation where agents can work independently in small collaboration regions, leveraging nearby local resources. Furthermore, we analyze the impact of a realistic frequency to observe the state of the environment, unlike the unrealistic common assumption in the literature of having observations readily available in real-time for every required action. The findings highlight the trade-off between realism and performance using an interval-based Gossip-based multi-casting protocol against assuming real-time observation availability for every generated workload.

Problem

Research questions and friction points this paper is trying to address.

Efficiently manage Fog resources for unpredictable IoT demands

Distribute IoT workloads intelligently to optimize waiting time

Analyze impact of realistic observation frequency on performance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-Agent Reinforcement Learning for load balancing

Transfer learning enables life-long self-adaptation

Gossip-based protocol for realistic state observation

🔎 Similar Papers

No similar papers found.