Learning with SASQuaTCh: a Novel Variational Quantum Transformer Architecture with Kernel-Based Self-Attention

📅 2024-03-21

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

204K/year

🤖 AI Summary

To address the prohibitive computational cost and scalability limitations of large-parameter classical Transformer models, this paper proposes SASQuaTCh—a variational quantum Transformer architecture leveraging kernel methods and multidimensional quantum Fourier transforms. It implements the self-attention mechanism via parameterized quantum circuits, introducing for the first time learnable quantum self-attention gates embedded within a quantum kernel framework. Theoretically, SASQuaTCh achieves exponential compression of parameter complexity relative to classical Transformers. Experimentally, it attains high-accuracy embedding and classification of grayscale handwritten digit images using only nine qubits. The approach is validated on both classical quantum simulators and real quantum hardware, demonstrating significant reductions in parameter count and runtime complexity. This work establishes a novel paradigm for lightweight, scalable, quantum-enhanced sequence modeling.

Technology Category

Application Category

📝 Abstract

The recent exploding growth in size of state-of-the-art machine learning models highlights a well-known issue where exponential parameter growth, which has grown to trillions as in the case of the Generative Pre-trained Transformer (GPT), leads to training time and memory requirements which limit their advancement in the near term. The predominant models use the so-called transformer network and have a large field of applicability, including predicting text and images, classification, and even predicting solutions to the dynamics of physical systems. Here we present a variational quantum circuit architecture named Self-Attention Sequential Quantum Transformer Channel (SASQuaTCh), which builds networks of qubits that perform analogous operations of the transformer network, namely the keystone self-attention operation, and leads to an exponential improvement in parameter complexity and run-time complexity over its classical counterpart. Our approach leverages recent insights from kernel-based operator learning in the context of predicting spatiotemporal systems to represent deep layers of a vision transformer network using simple gate operations and a set of multi-dimensional quantum Fourier transforms. To validate our approach, we consider image classification tasks in simulation and with hardware, where with only 9 qubits and a handful of parameters we are able to simultaneously embed and classify a grayscale image of handwritten digits with high accuracy.

Problem

Research questions and friction points this paper is trying to address.

Exponential parameter growth limits model advancement

Transformer networks face high memory and training time

Quantum architecture improves parameter and run-time complexity

Innovation

Methods, ideas, or system contributions that make the work stand out.

Quantum transformer architecture

Kernel-based self-attention

Exponential parameter complexity reduction

🔎 Similar Papers

Quantum linear algebra is all you need for Transformer architectures