🤖 AI Summary
Large-scale matrix product state (MPS) sampling faces bottlenecks: data parallelism is constrained by memory and I/O limitations, while model parallelism suffers from rigid process binding and poor scalability.
Method: This paper proposes a multi-level cooperative framework integrating sample-level data parallelism with bond-dimension tensor parallelism. It revives efficient data parallelism for MPS sampling—novelly combining low-rank compression, computation-communication overlap, distributed bond-dimension partitioning, and high-precision floating-point optimization.
Contribution/Results: The framework enables elastic scaling to thousands of processes, breaking the hard process-count constraints of conventional model parallelism. Experiments demonstrate >10× speedup over state-of-the-art simulators on Gaussian boson sampling; stable simulation of MPS with 8,176 sites and bond dimension χ = 10⁴; and significantly superior strong scaling compared to current best methods.
📝 Abstract
Matrix Product State (MPS) is a versatile tensor network representation widely applied in quantum physics, quantum chemistry, and machine learning, etc. MPS sampling serves as a critical fundamental operation in these fields. As the problems become more complex, the scale of MPS is rapidly increasing. Traditional data parallelism is limited by memory and heavy I/O in large-scale MPS. Model parallelism that can handle large-scale MPS imposes rigid process bindings and lacks scalability. This work proposes Fast-MPS, a multi-level parallel framework for scalable MPS sampling. Our design combines data parallelism across samples with tensor parallelism along bond dimensions. We eliminate memory and I/O pressure through compression and overlapping, and revive data parallel in large-scale MPS sampling. We evaluate our approach on Gaussian Boson Sampling, a representative and demanding application. Fast-MPS achieves over 10x speedup compared to existing simulators, scales to thousands of processes, and enables simulations with 8,176 sites and bond dimension chi = 10^4, significantly outperforming the state of the art. Fast-MPS has demonstrated great potential in high-performance tensor network applications.