Streamlined optical training of large-scale modern deep learning architectures with direct feedback alignment

📅 2024-09-01

📈 Citations: 1

✨ Influential: 0

🤖 AI Summary

To address the high energy consumption and computational bottlenecks of electronic hardware in deep learning training, this work presents the first experimental implementation of Direct Feedback Alignment (DFA) on a hybrid photonic-electronic platform, overcoming the longstanding limitation that optical training has been restricted to shallow models. We propose a photonic-electronic co-processing DFA paradigm: photonic processing units efficiently perform random matrix multiplications, while electronic circuits handle error feedback and parameter updates—thereby circumventing the fundamental challenge of gradient propagation in the optical domain inherent to backpropagation. The system operates at under 30 W power consumption and achieves a peak computational throughput of 1500 TeraOPS. We demonstrate end-to-end optical training of a Transformer model with over one billion parameters, attaining competitive performance across multimodal tasks—including language understanding, image classification, and diffusion-based generation—while significantly outperforming equivalently scaled all-electronic training systems in speed.

Technology Category

Application Category

📝 Abstract

Modern deep learning relies nearly exclusively on dedicated electronic hardware accelerators. Photonic approaches, with low consumption and high operation speed, are increasingly considered for inference but, to date, remain mostly limited to relatively basic tasks. Simultaneously, the problem of training deep and complex neural networks, overwhelmingly performed through backpropagation, remains a significant limitation to the size and, consequently, the performance of current architectures and a major compute and energy bottleneck. Here, we experimentally implement a versatile and scalable training algorithm, called direct feedback alignment, on a hybrid electronic-photonic platform. An optical processing unit performs large-scale random matrix multiplications, which is the central operation of this algorithm, at speeds up to 1500 TeraOPS under 30 Watts of power. We perform optical training of modern deep learning architectures, including Transformers, with more than 1B parameters, and obtain good performances on language, vision, and diffusion-based generative tasks. We study the scaling of the training time, and demonstrate a potential advantage of our hybrid opto-electronic approach for ultra-deep and wide neural networks, thus opening a promising route to sustain the exponential growth of modern artificial intelligence beyond traditional von Neumann approaches.

Problem

Research questions and friction points this paper is trying to address.

Training large deep networks efficiently with photonic methods

Overcoming backpropagation limitations in size and energy

Scaling optical training for ultra-deep neural networks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid electronic-photonic platform training

Optical processing for random matrix multiplications

Direct feedback alignment for deep networks

🔎 Similar Papers

No similar papers found.

Authors to Follow