🤖 AI Summary
Addressing the classical simulation challenge of ultra-large-scale random quantum circuits (e.g., “Zuchongzhi” 60×24), this work presents the first high-performance tensor network contraction simulator tailored for the Sunway many-core architecture, scaling to thousands of nodes. Methodologically, it introduces three novel techniques: multi-core cooperative step fusion, on-chip vectorized permutation, and a split-K tensor contraction operator—overcoming the traditional bottlenecks of excessive slicing overhead, poor data locality, and low computational intensity. The design achieves full-stack co-optimization across tensor contraction algorithms, hardware architecture, memory layout, and in-kernel vectorization. Evaluated on 399,360 cores (1,024 Sunway nodes), the simulator delivers over 10× speedup versus prior state-of-the-art, establishing a new record for classical simulation of the most complex random quantum circuits to date.
📝 Abstract
Classical simulation is essential in quantum algorithm development and quantum device verification. With the increasing complexity and diversity of quantum circuit structures, existing classical simulation algorithms need to be improved and extended. In this work, we propose novel strategies for tensor network contraction based simulator on Sunway architecture. Our approach addresses three main aspects: complexity, computational paradigms and fine-grained optimization. Data reuse schemes are designed to reduce floating-point operations, and memory organization techniques are employed to eliminate slicing overhead while maintaining parallelism. Step fusion strategy is extended by multi-core cooperation to improve the data locality and computation intensity. Fine-grained optimizations, such as in-kernel vectorized permutations, and split-K operators, are developed as well to address the challenges in new hotspot distribution and topological structure. These innovations can accelerate the simulation of the Zuchongzhi-60-24 by more than 10 times, using more than 1024 Sunway nodes (399,360 cores). Our work demonstrates the potential for enabling efficient classical simulation of increasingly complex quantum circuits.