๐ค AI Summary
This work addresses the high communication overhead in distributed neural modular training by proposing a communication-efficient training framework. It constructs a capacity-aware shortest-path tree rooted at a fusion node and prunes non-tree edges to yield a sparse communication topology. Local routing is modeled via a finite-rate stochastic gating mechanism, and rateโdistortion theory guides the joint optimization of sparsification and information compression. The method reduces training communication volume by 70.4% without compromising model accuracy and further decreases the transmission rate of latent variables by 45.7% through information bottleneck regularization, substantially enhancing the efficiency of distributed training.
๐ Abstract
In-network learning (INL) trains distributed neural modules by exchanging latent activations and backpropagated errors over a communication graph. This letter proposes Dijkstra-pruned INL (D-INL), which removes non-tree links by retaining a capacity-aware shortest-path tree rooted at the fusion node. To balance sparsity and predictive information, local routing (or aggregation) is modeled as a finite-rate stochastic gate with rate $R_g=I(Z; T)$. We derive a rate-distortion-generalization bound and validate the method on a reproducible distributed-classification experiment, where D-INL reduces training exchange by $70.4\%$ while preserving accuracy within the standard deviation of dense INL. Adding finite-rate regularization further reduces the estimated latent rate by $45.7\%$ relative to unregularized Dijkstra INL.