Parallelizing Large-Scale Tensor Network Contraction on Multiple GPUs

📅 2026-06-01
📈 Citations: 0
Influential: 0
📄 PDF

career value

217K/year
🤖 AI Summary
This work addresses the exponential communication overhead and redundant computation inherent in traditional slice-based tensor network contraction methods, which hinder efficient scaling on multi-GPU systems. The authors propose the first communication-aware distributed tensor network contraction framework that abandons slicing altogether. By introducing GEMM-oriented mode reordering and communication-aware intermediate tensor distribution planning, the framework transforms a fixed contraction path into an efficient execution schedule. Leveraging optimizations for NVLink and InfiniBand high-speed interconnects, the method achieves speedups of 7–173× over slicing on a single DGX H100 node and 42–67,869× on 1,024 H100 GPUs, with computational utilization reaching 87%–101%, thereby substantially overcoming existing scalability bottlenecks.
📝 Abstract
Exact tensor network contraction underpins quantum circuit simulation, quantum error correction, combinatorial optimization, and many-body dynamics. The dominant parallelization strategy, slicing, scales exponentially and incurs redundant computation. We present a multi-GPU framework that instead distributes intermediate tensors across devices with explicit communication, converting a fixed contraction path into a communication-efficient schedule via GEMM-oriented mode reordering and communication-aware mode distribution planning. Within a single DGX H100 node (8 GPUs, NVLink), distribution delivers $7$--$173\times$ extra speedup beyond embarrassingly parallel slicing, capturing nearly all of the available compute reduction (87--101%) because NVLink's high bandwidth keeps communication small relative to compute. Scaling the same four workloads to 1024 H100 GPUs over InfiniBand, the extra speedup beyond slicing ranges from $42\times$ to $67{,}869\times$, demonstrating that communication-aware distributed contraction far surpasses slicing-based scaling limits for frontier tensor networks.
Problem

Research questions and friction points this paper is trying to address.

tensor network contraction
parallelization
multi-GPU
slicing
communication overhead
Innovation

Methods, ideas, or system contributions that make the work stand out.

tensor network contraction
multi-GPU parallelization
communication-aware scheduling
mode reordering
distributed tensor computation
🔎 Similar Papers
No similar papers found.