Communication-Aware Diffusion Load Balancing for Persistently Interacting Objects

📅 2026-03-24

📈 Citations: 0

✨ Influential: 0

career value

230K/year

🤖 AI Summary

This work addresses load imbalance in communication-intensive parallel applications caused by irregular and time-varying workloads, with particular attention to the cross-node communication overhead among continuously interacting objects. The authors propose a communication-aware distributed diffusion-based load balancing method that, for the first time, incorporates communication graph information into the diffusion process to simultaneously balance computational load and minimize inter-node communication. An adaptive variant is also introduced for scenarios with unknown communication patterns, which infers communication structures online to enable effective optimization. Experimental evaluation on both synthetic workloads and the Particle-in-Cell benchmark demonstrates the approach’s efficacy, showing significant improvements over existing strategies on an 8-node configuration of the NERSC Perlmutter system.

Technology Category

Application Category

📝 Abstract

Parallel applications with irregular and time-varying workloads often suffer from load imbalance. Dynamic load balancing techniques address this challenge by redistributing work during execution. We present a new type of distributed diffusion-based load balancing targeted at communication-intensive applications with persistently communicating objects. Leveraging the application's communication graph, our strategy reduces across-node communication while simultaneously distributing load effectively. We also propose an algorithmic variant for cases where the communication patterns are not readily available. We explore optimizations to our algorithm, and comparisons with other related load balancing strategies in simulation and on a Particle-in-Cell benchmark on up to 8 nodes of Perlmutter at NERSC.

Problem

Research questions and friction points this paper is trying to address.

load balancing

communication-aware

diffusion

persistently interacting objects

irregular workloads

Innovation

Methods, ideas, or system contributions that make the work stand out.

communication-aware

diffusion-based load balancing

persistently interacting objects