🤖 AI Summary
This work addresses the significant energy inefficiency in distributed graph neural network training caused by fine-grained remote procedure calls (RPCs) from multi-hop cross-partition sampling, a problem exacerbated by the inability of static caching to adapt to dynamic network congestion. To this end, the paper introduces the first runtime adaptive caching mechanism, formulating cache window management as a sequential decision-making problem. A Double-DQN agent, trained in an in-domain randomized congestion simulator, dynamically optimizes boundary caching policies per partition. Coupled with an asynchronous double-buffered communication pipeline, this approach enables real-time adaptation with zero additional overhead. Experiments demonstrate that under congested conditions, the method reduces total energy consumption by up to 43% compared to DGL’s default implementation and outperforms the best static strategy by 4%–24%, while achieving near-optimal performance in uncongested scenarios.
📝 Abstract
Distributed GNN training is dominated by remote feature fetching, which can be very costly. Multi-hop neighborhood sampling crosses partition boundaries and triggers fine-grained RPCs whose fixed initiation cost and GPU-stall latency waste energy. Prior systems try to reduce this overhead with presampling and static caching, but cache policies cannot react to runtime network variation. We show that under time-varying congestion, static caching can increase energy by up to 45% because a fixed rebuild schedule is insufficient. We present GreenDyGNN, which formulates cache window management as a sequential decision problem. GreenDyGNN performs intra-epoch cache rebuilds and uses a Double-DQN agent, trained in a calibrated simulator with domain-randomized congestion, to adapt rebuild window size and per-owner cache allocation at each boundary. An asynchronous double-buffered pipeline makes adaptation effectively free. Under congestion, GreenDyGNN cuts total energy by up to 43% over Default DGL and 4-24% over the best static policy, while closely matching the optimum under clean conditions.