đ€ AI Summary
To address the high energy consumption and computational bottlenecks of electronic hardware in deep learning training, this work presents the first experimental implementation of Direct Feedback Alignment (DFA) on a hybrid photonic-electronic platform, overcoming the longstanding limitation that optical training has been restricted to shallow models. We propose a photonic-electronic co-processing DFA paradigm: photonic processing units efficiently perform random matrix multiplications, while electronic circuits handle error feedback and parameter updatesâthereby circumventing the fundamental challenge of gradient propagation in the optical domain inherent to backpropagation. The system operates at under 30 W power consumption and achieves a peak computational throughput of 1500 TeraOPS. We demonstrate end-to-end optical training of a Transformer model with over one billion parameters, attaining competitive performance across multimodal tasksâincluding language understanding, image classification, and diffusion-based generationâwhile significantly outperforming equivalently scaled all-electronic training systems in speed.
đ Abstract
Modern deep learning relies nearly exclusively on dedicated electronic hardware accelerators. Photonic approaches, with low consumption and high operation speed, are increasingly considered for inference but, to date, remain mostly limited to relatively basic tasks. Simultaneously, the problem of training deep and complex neural networks, overwhelmingly performed through backpropagation, remains a significant limitation to the size and, consequently, the performance of current architectures and a major compute and energy bottleneck. Here, we experimentally implement a versatile and scalable training algorithm, called direct feedback alignment, on a hybrid electronic-photonic platform. An optical processing unit performs large-scale random matrix multiplications, which is the central operation of this algorithm, at speeds up to 1500 TeraOPS under 30 Watts of power. We perform optical training of modern deep learning architectures, including Transformers, with more than 1B parameters, and obtain good performances on language, vision, and diffusion-based generative tasks. We study the scaling of the training time, and demonstrate a potential advantage of our hybrid opto-electronic approach for ultra-deep and wide neural networks, thus opening a promising route to sustain the exponential growth of modern artificial intelligence beyond traditional von Neumann approaches.