🤖 AI Summary
To address the exponential memory bottleneck of state-vector simulation for thousand-qubit quantum circuits, this work employs the matrix-product-state (MPS) tensor network formalism and presents the first systematic evaluation of CUDA-Q’s MPS simulator on the NVIDIA Grace Hopper heterogeneous platform. Leveraging GPU-accelerated tensor contractions, it achieves controllable-accuracy approximate simulation of highly entangled circuits exceeding 1,000 qubits. The contributions are threefold: (i) it demonstrates the feasibility and scalability of MPS-based simulation for ultra-large-scale quantum circuits; (ii) it identifies suboptimal GPU parallelization—specifically, insufficient GPU utilization—as a critical performance bottleneck in the current CUDA-Q MPS implementation; and (iii) it provides empirical evidence and concrete directions for optimizing tensor-network simulation on heterogeneous architectures, particularly concerning memory access patterns, load balancing, and kernel efficiency.
📝 Abstract
Quantum computer simulators are an indispensable tool for prototyping quantum algorithms and verifying the functioning of existing quantum computer hardware. The current largest quantum computers feature more than one thousand qubits, challenging their classical simulators. State-vector quantum simulators are challenged by the exponential increase of representable quantum states with respect to the number of qubits, making more than fifty qubits practically unfeasible. A more appealing approach for simulating quantum computers is adopting the tensor network approach, whose memory requirements fundamentally depend on the level of entanglement in the quantum circuit, and allows simulating the current largest quantum computers. This work investigates and evaluates the CUDA-Q tensor network simulators on an Nvidia Grace Hopper system, particularly the Matrix Product State (MPS) formulation. We compare the performance of the CUDA-Q state vector implementation and validate the correctness of MPS simulations. Our results highlight that tensor network-based methods provide a significant opportunity to simulate large-qubit circuits, albeit approximately. We also show that current GPU-accelerated computation cannot fully utilize GPU efficiently in the case of MPS simulations.