Integrated electro-optic attention nonlinearities for transformers

📅 2026-04-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the inference latency bottleneck in Transformers caused by nonlinear operations such as Softmax, despite their low computational cost. The authors propose, for the first time, an analog optoelectronic nonlinear computing unit based on thin-film lithium niobate (TFLN) Mach–Zehnder modulators to efficiently implement Softmax and Sigmoid functions. Operating with only 4-bit quantization, the approach maintains high accuracy, supports high-speed 10 GBaud encoding, and demonstrates strong robustness under noisy conditions. Experimental results show that the method achieves accuracy comparable to purely digital implementations in both vision Transformers and large language models while significantly reducing the latency of nonlinear computations, thereby validating the feasibility of TFLN-based devices for accelerating attention mechanisms in hybrid integrated hardware systems.

Technology Category

Application Category

📝 Abstract
Transformers have emerged as the dominant neural-network architecture, achieving state-of-the-art performance in language processing and computer vision. At the core of these models lies the attention mechanism, which requires a nonlinear, non-negative mapping using the Softmax function. However, although Softmax operations account for less than 1% of the total operation count, they can disproportionately bottleneck overall inference latency. Here, we use thin-film lithium niobate (TFLN) Mach-Zehnder modulators (MZMs) as analog nonlinear computational elements to drastically reduce the latency of nonlinear computations. We implement electro-optic alternatives to digital Softmax and Sigmoid, and evaluate their performance in Vision Transformers and Large Language Models. Our system maintains highly competitive accuracy, even under aggressive 4-bit input-output quantization of the analog units. We further characterize system noise at encoding speeds up to 10 GBaud and assess model robustness under various noise conditions. Our findings suggest that TFLN modulators can serve as nonlinear function units within hybrid co-packaged hardware, enabling high-speed and energy-efficient nonlinear computation.
Problem

Research questions and friction points this paper is trying to address.

attention mechanism
Softmax bottleneck
nonlinear computation
inference latency
transformers
Innovation

Methods, ideas, or system contributions that make the work stand out.

electro-optic nonlinearity
thin-film lithium niobate
Mach-Zehnder modulator
analog Softmax
hybrid co-packaged hardware
🔎 Similar Papers
No similar papers found.
L
Luis Mickeler
Optical Nanomaterial Group, Department of Physics, ETH Zurich, Zurich, Switzerland
K
Kai Lion
Optimization & Decision Intelligence Group, Department of Computer Science, ETH Zurich, Zurich, Switzerland
A
Alfonso Nardi
Department of Physics, Politecnico di Milano, Milan, Italy
J
Jost Kellner
Optical Nanomaterial Group, Department of Physics, ETH Zurich, Zurich, Switzerland
P
Pierre Didier
Optical Nanomaterial Group, Department of Physics, ETH Zurich, Zurich, Switzerland
Bhavin J. Shastri
Bhavin J. Shastri
Canada Research Chair and Associate Professor, Queen's University
NanophotonicsNeuromorphic photonicsPhotonic ComputingPhotonic Integrated Circuits
Niao He
Niao He
Associate Professor, ETH Zürich
OptimizationMachine LearningReinforcement Learning
R
Rachel Grange
Optical Nanomaterial Group, Department of Physics, ETH Zurich, Zurich, Switzerland