Automatic Operator-level Parallelism Planning for Distributed Deep Learning -- A Mixed-Integer Programming Approach

📅 2025-03-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Manual design of parallelization strategies for large-model distributed training is labor-intensive and poorly supports nonlinear architectures—such as Mixture-of-Experts (MoE), multimodal MIMO, and branched topologies. Method: This paper proposes a hardware-aware automatic parallelization planning framework. It is the first to formulate operator-level distributed deployment as a mixed-integer programming (MIP) scheduling problem, introducing a two-tier optimization architecture: an upper tier jointly optimizes device mapping and communication scheduling, while a lower tier co-optimizes computational graph partitioning and memory reuse. Contribution/Results: The method natively supports complex topologies and balances solution optimality with tractable runtime. Experiments show that, under identical memory constraints, it reduces computation bubbles by 50%, achieves throughput and hardware utilization on par with or exceeding expert-designed strategies (e.g., DeepSeek DualPipe), and enables multi-objective joint optimization.

Technology Category

Application Category

📝 Abstract
As the artificial intelligence community advances into the era of large models with billions of parameters, distributed training and inference have become essential. While various parallelism strategies-data, model, sequence, and pipeline-have been successfully implemented for popular neural networks on main-stream hardware, optimizing the distributed deployment schedule requires extensive expertise and manual effort. Further more, while existing frameworks with most simple chain-like structures, they struggle with complex non-linear architectures. Mixture-of-experts and multi-modal models feature intricate MIMO and branch-rich topologies that require fine-grained operator-level parallelization beyond the capabilities of existing frameworks. We propose formulating parallelism planning as a scheduling optimization problem using mixed-integer programming. We propose a bi-level solution framework balancing optimality with computational efficiency, automatically generating effective distributed plans that capture both the heterogeneous structure of modern neural networks and the underlying hardware constraints. In experiments comparing against expert-designed strategies like DeepSeek's DualPipe, our framework achieves comparable or superior performance, reducing computational bubbles by half under the same memory constraints. The framework's versatility extends beyond throughput optimization to incorporate hardware utilization maximization, memory capacity constraints, and other considerations or potential strategies. Such capabilities position our solution as both a valuable research tool for exploring optimal parallelization strategies and a practical industrial solution for large-scale AI deployment.
Problem

Research questions and friction points this paper is trying to address.

Optimizing distributed deployment for large AI models
Handling complex non-linear neural network architectures
Automating operator-level parallelism planning efficiently
Innovation

Methods, ideas, or system contributions that make the work stand out.

Mixed-integer programming for parallelism planning
Bi-level framework balancing optimality and efficiency
Automated distributed plans for complex neural networks
🔎 Similar Papers
No similar papers found.
R
Ruifeng She
Noah’s Ark Lab, Huawei
Bowen Pang
Bowen Pang
Noah Ark's Lab, Huawei
K
Kai Li
Noah’s Ark Lab, Huawei
Z
Zehua Liu
Noah’s Ark Lab, Huawei
T
Tao Zhong
Noah’s Ark Lab, Huawei