🤖 AI Summary
On-chip silicon photonic interconnects face a fundamental trade-off between reconfiguration latency and performance gains when scaling collective communication primitives (e.g., AllReduce).
Method: This work establishes the first theoretical analytical framework for adaptive photonic networks, modeling topology reconfiguration as a Birkhoff–von Neumann matrix decomposition problem. It jointly integrates maximum concurrent flow analysis with the classical α–β communication cost model to quantitatively characterize the latency–throughput Pareto frontier.
Contribution/Results: We derive a decidable criterion—“when reconfiguration is beneficial”—that explicitly identifies optimal reconfiguration opportunities, clarifies design directions for photonic networks, and provides a systematic pathway for algorithm-hardware co-design and software-hardware integration. Our framework delivers the first foundation for programmable optical interconnects in collective intelligent computing that is both theoretically rigorous and practically actionable.
📝 Abstract
As chip-to-chip silicon photonics gain traction for their bandwidth and energy efficiency, collective communication has emerged as a critical bottleneck in scale-up systems. Programmable photonic interconnects offer a promising path forward: by dynamically reconfiguring the fabric, they can establish direct, high-bandwidth optical paths between communicating endpoints -- emph{synchronously and guided by the structure of collective operations} (e.g., AllReduce). However, realizing this vision -- emph{when light bends to the collective will} -- requires navigating a fundamental trade-off between reconfiguration delay and the performance gains of adaptive topologies.
In this paper, we present a simple theoretical framework for adaptive photonic scale-up domains that makes this trade-off explicit and clarifies when reconfiguration is worthwhile. Along the way, we highlight a connection -- not surprising but still powerful -- between the Birkhoff--von Neumann (BvN) decomposition, maximum concurrent flow (a classic measure of network throughput), and the well-known $α$--$β$ cost model for collectives. Finally, we outline a research agenda in algorithm design and systems integration that can build on this foundation.