đ¤ AI Summary
This work addresses the challenge of modeling dynamic performance phenomenaâsuch as synchronization mismatches, latency propagation, bottlenecks, and self-desynchronizationâin MPI parallel programs. We propose a topology-aware coupled-oscillator dynamical framework. Methodologically, we introduce the first lightweight MPI dynamic model grounded in the Kuramoto oscillator paradigm; design customized nonlinear coupling potential functions for memory- or compute-constrained workloads; and uncover a novel mechanism wherein moderate noise accelerates resynchronization in large-scale applications. Using phase-order parameters, synchronization entropy, phase gradients, and differential analysisâvalidated empirically against real MPI execution tracesâour simulations achieve strong qualitative agreement and high fidelity across quantitative metrics (e.g., phase coherence and perturbation decay rate). This work establishes an interpretable, scalable paradigm for parallel performance modeling and hardware-software co-optimization.
đ Abstract
We propose a novel, lightweight, and physically inspired approach to modeling the dynamics of parallel distributed-memory programs. Inspired by the Kuramoto model, we represent MPI processes as coupled oscillators with topology-aware interactions, custom coupling potentials, and stochastic noise. The resulting system of nonlinear ordinary differential equations opens a path to modeling key performance phenomena of parallel programs, including synchronization, delay propagation and decay, bottlenecks, and self-desynchronization. This paper introduces interaction potentials to describe memory- and compute-bound workloads and employs multiple quantitative metrics -- such as an order parameter, synchronization entropy, phase gradients, and phase differences -- to evaluate phase coherence and disruption. We also investigate the role of local noise and show that moderate noise can accelerate resynchronization in scalable applications. Our simulations align qualitatively with MPI trace data, showing the potential of physics-informed abstractions to predict performance patterns, which offers a new perspective for performance modeling and software-hardware co-design in parallel computing.