m4: A Learned Flow-level Network Simulator

📅 2025-03-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing flow-level network simulators sacrifice accuracy by omitting packet-level effects—such as queuing, congestion control, and retransmissions—to achieve speedups, resulting in severe fidelity loss. This paper proposes the first spatiotemporal decoupled neural architecture for high-fidelity flow-level simulation, introducing a dense multi-target supervision mechanism over intermediate states—including residual flow size and queue length. Our method integrates spatiotemporal decomposition modeling with deep supervised learning, explicitly capturing critical packet-level dynamics while preserving the computational efficiency of flow-level abstraction. Experiments demonstrate: (i) up to 10⁴× speedup over packet-level simulation; (ii) 45.3% reduction in mean flow completion time (FCT) prediction error and 53.0% reduction in p90 error; and (iii) robust support for diverse congestion control protocols and accurate closed-loop throughput prediction. The approach significantly enhances both fidelity and practicality of large-scale datacenter network simulation.

Technology Category

Application Category

📝 Abstract
Flow-level simulation is widely used to model large-scale data center networks due to its scalability. Unlike packet-level simulators that model individual packets, flow-level simulators abstract traffic as continuous flows with dynamically assigned transmission rates. While this abstraction enables orders-of-magnitude speedup, it is inaccurate by omitting critical packet-level effects such as queuing, congestion control, and retransmissions. We present m4, an accurate and scalable flow-level simulator that uses machine learning to learn the dynamics of the network of interest. At the core of m4 lies a novel ML architecture that decomposes state transition computations into distinct spatial and temporal components, each represented by a suitable neural network. To efficiently learn the underlying flow-level dynamics, m4 adds dense supervision signals by predicting intermediate network metrics such as remaining flow size and queue length during training. m4 achieves a speedup of up to 104$ imes$ over packet-level simulation. Relative to a traditional flow-level simulation, m4 reduces per-flow estimation errors by 45.3% (mean) and 53.0% (p90). For closed-loop applications, m4 accurately predicts network throughput under various congestion control schemes and workloads.
Problem

Research questions and friction points this paper is trying to address.

Improves accuracy of flow-level network simulations using machine learning.
Addresses inaccuracies in traditional flow-level simulations by modeling packet-level effects.
Achieves significant speedup while reducing per-flow estimation errors.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Machine learning for network dynamics
Decomposed spatial-temporal neural networks
Dense supervision with intermediate metrics
🔎 Similar Papers
No similar papers found.
Chenning Li
Chenning Li
PhD student at MIT CSAIL
Network SimulationsML Systems
A
Anton A. Zabreyko
MIT CSAIL
Arash Nasr-Esfahany
Arash Nasr-Esfahany
PhD Student at MIT
Computer NetworksComputer SystemsMachine LearningCausal Inference
K
Kevin Zhao
University of Washington
Prateesh Goyal
Prateesh Goyal
Microsoft Research
M
Mohammad Alizadeh
MIT CSAIL
T
Thomas E. Anderson
University of Washington