FastMPS: Revisit Data Parallel in Large-scale Matrix Product State Sampling

📅 2025-12-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large-scale matrix product state (MPS) sampling faces bottlenecks: data parallelism is constrained by memory and I/O limitations, while model parallelism suffers from rigid process binding and poor scalability. Method: This paper proposes a multi-level cooperative framework integrating sample-level data parallelism with bond-dimension tensor parallelism. It revives efficient data parallelism for MPS sampling—novelly combining low-rank compression, computation-communication overlap, distributed bond-dimension partitioning, and high-precision floating-point optimization. Contribution/Results: The framework enables elastic scaling to thousands of processes, breaking the hard process-count constraints of conventional model parallelism. Experiments demonstrate >10× speedup over state-of-the-art simulators on Gaussian boson sampling; stable simulation of MPS with 8,176 sites and bond dimension χ = 10⁴; and significantly superior strong scaling compared to current best methods.

Technology Category

Application Category

📝 Abstract
Matrix Product State (MPS) is a versatile tensor network representation widely applied in quantum physics, quantum chemistry, and machine learning, etc. MPS sampling serves as a critical fundamental operation in these fields. As the problems become more complex, the scale of MPS is rapidly increasing. Traditional data parallelism is limited by memory and heavy I/O in large-scale MPS. Model parallelism that can handle large-scale MPS imposes rigid process bindings and lacks scalability. This work proposes Fast-MPS, a multi-level parallel framework for scalable MPS sampling. Our design combines data parallelism across samples with tensor parallelism along bond dimensions. We eliminate memory and I/O pressure through compression and overlapping, and revive data parallel in large-scale MPS sampling. We evaluate our approach on Gaussian Boson Sampling, a representative and demanding application. Fast-MPS achieves over 10x speedup compared to existing simulators, scales to thousands of processes, and enables simulations with 8,176 sites and bond dimension chi = 10^4, significantly outperforming the state of the art. Fast-MPS has demonstrated great potential in high-performance tensor network applications.
Problem

Research questions and friction points this paper is trying to address.

Addresses memory and I/O limitations in large-scale MPS sampling
Overcomes rigid process bindings and scalability issues in model parallelism
Enables efficient data parallelism for high-performance tensor network applications
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-level parallel framework combining data and tensor parallelism
Compression and overlapping to eliminate memory and I/O pressure
Scalable to thousands of processes with over 10x speedup
🔎 Similar Papers
No similar papers found.
Y
Yaojian Chen
Department of Computer Science and Technology, Tsinghua University, Beijing, China
S
Si-Qiu Gong
Hefei National Research Center for Physical Sciences at the Microscale and School of Physical Sciences, University of Science and Technology of China, Hefei, China
Lin Gan
Lin Gan
Tsinghua University
Y
Yanfei Liu
National Supercomputing Center in Wuxi, Wuxi, China
An Yang
An Yang
Qwen Team, Peking University
Nature Language Processing (NLP)
Yinuo Wang
Yinuo Wang
Tsinghua University
LLMReinforcement LearningAutonomous DrivingDiffusion Model
C
Chao-yang Lu
Hefei National Research Center for Physical Sciences at the Microscale and School of Physical Sciences, University of Science and Technology of China, Hefei, China
Guangwen Yang
Guangwen Yang
Professor of Computer Science and Technology, Tsinghua University