🤖 AI Summary
This work addresses the critical question of how fault-tolerant quantum processors (QPUs) can accelerate deep learning inference. Methodologically, it presents the first full hardware implementation of a multi-layer neural network with nonlinear activation on quantum hardware, realized via a fully coherent quantum multilayer architecture emulating ResNet. The design integrates quantum 2D convolution, quantum Sigmoid activation, skip connections, layer normalization, and random projection, alongside an efficient quantum-access mechanism for weights and inputs. Theoretically, it achieves quadratic speedup over classical inference without additional assumptions; quartic speedup when quantum weight access is available; and—when both quantum input and weight access are supported—reduces inference complexity to $O(mathrm{polylog}(N/varepsilon)^k)$, yielding exponential efficiency gains. This work establishes the first scalable, structurally complete quantum neural network paradigm for quantum-classical hybrid inference.
📝 Abstract
Fault-tolerant Quantum Processing Units (QPUs) promise to deliver exponential speed-ups in select computational tasks, yet their integration into modern deep learning pipelines remains unclear. In this work, we take a step towards bridging this gap by presenting the first fully-coherent quantum implementation of a multilayer neural network with non-linear activation functions. Our constructions mirror widely used deep learning architectures based on ResNet, and consist of residual blocks with multi-filter 2D convolutions, sigmoid activations, skip-connections, and layer normalizations. We analyse the complexity of inference for networks under three quantum data access regimes. Without any assumptions, we establish a quadratic speedup over classical methods for shallow bilinear-style networks. With efficient quantum access to the weights, we obtain a quartic speedup over classical methods. With efficient quantum access to both the inputs and the network weights, we prove that a network with an $N$-dimensional vectorized input, $k$ residual block layers, and a final residual-linear-pooling layer can be implemented with an error of $ε$ with $O( ext{polylog}(N/ε)^k)$ inference cost.