🤖 AI Summary
This work addresses the efficient execution of all-to-all communication in circuit-switched optical interconnects by jointly modeling topology reconfiguration and flow scheduling. It introduces a unified abstraction based on adjacency matrices and their power sums to characterize the complete solution space and derive a lower bound on communication completion time. To circumvent combinatorial explosion, the authors construct a family of highly symmetric and strongly expandable topology sequences, enabling the design of a low-overhead near-optimal scheduling algorithm. Experimental results demonstrate that, across diverse network parameters, message sizes, and traffic loads, the proposed approach reduces average all-to-all communication completion time by 44%.
📝 Abstract
All-to-all collective communication is a core primitive in distributed machine learning and high-performance computing. At the server scale, the communication demands of these workloads are increasingly outstripping the bandwidth and energy limits of electrical interconnects, driving a growing interest in photonic interconnects. However, leveraging these interconnects for all-to-all communication is nontrivial. The core challenge lies in jointly optimizing a sequence of topologies and flow schedules, reconfiguring only when the transmission savings from traversing shorter paths outweigh the reconfiguration cost. Yet the search space of this joint optimization is enormous. Existing work sidesteps this challenge by making unrealistic assumptions on reconfiguration costs so that it is never or always worthwhile to reconfigure. In this paper, we show that any candidate sequence of topologies and flow schedules can be expressed as a sum of adjacency matrices and their powers. This abstraction captures the entire solution space and yields a lower bound on all-to-all completion time. Building on this formulation, we identify a family of topology sequences with strong symmetry and high expansion that admits bandwidth-efficient schedules, which our algorithm constructs with low computational overhead. Together, these insights allow us to efficiently construct near-optimal solutions, effectively avoiding enumeration of the combinatorial design space. Evaluation shows that our approach reduces all-to-all completion time by up to 44% on average across a wide range of network parameters, message sizes and workload types.