FastMPS: Revisit Data Parallel in Large-scale Matrix Product State Sampling

📅 2025-12-23

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Large-scale matrix product state (MPS) sampling faces bottlenecks: data parallelism is constrained by memory and I/O limitations, while model parallelism suffers from rigid process binding and poor scalability. Method: This paper proposes a multi-level cooperative framework integrating sample-level data parallelism with bond-dimension tensor parallelism. It revives efficient data parallelism for MPS sampling—novelly combining low-rank compression, computation-communication overlap, distributed bond-dimension partitioning, and high-precision floating-point optimization. Contribution/Results: The framework enables elastic scaling to thousands of processes, breaking the hard process-count constraints of conventional model parallelism. Experiments demonstrate >10× speedup over state-of-the-art simulators on Gaussian boson sampling; stable simulation of MPS with 8,176 sites and bond dimension χ = 10⁴; and significantly superior strong scaling compared to current best methods.

Technology Category

Application Category

📝 Abstract

Matrix Product State (MPS) is a versatile tensor network representation widely applied in quantum physics, quantum chemistry, and machine learning, etc. MPS sampling serves as a critical fundamental operation in these fields. As the problems become more complex, the scale of MPS is rapidly increasing. Traditional data parallelism is limited by memory and heavy I/O in large-scale MPS. Model parallelism that can handle large-scale MPS imposes rigid process bindings and lacks scalability. This work proposes Fast-MPS, a multi-level parallel framework for scalable MPS sampling. Our design combines data parallelism across samples with tensor parallelism along bond dimensions. We eliminate memory and I/O pressure through compression and overlapping, and revive data parallel in large-scale MPS sampling. We evaluate our approach on Gaussian Boson Sampling, a representative and demanding application. Fast-MPS achieves over 10x speedup compared to existing simulators, scales to thousands of processes, and enables simulations with 8,176 sites and bond dimension chi = 10^4, significantly outperforming the state of the art. Fast-MPS has demonstrated great potential in high-performance tensor network applications.

Problem

Research questions and friction points this paper is trying to address.

Addresses memory and I/O limitations in large-scale MPS sampling

Overcomes rigid process bindings and scalability issues in model parallelism

Enables efficient data parallelism for high-performance tensor network applications

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-level parallel framework combining data and tensor parallelism

Compression and overlapping to eliminate memory and I/O pressure

Scalable to thousands of processes with over 10x speedup

🔎 Similar Papers

No similar papers found.

Authors to Follow