PCLVis: Visual Analytics of Process Communication Latency in Large-Scale Simulation

๐Ÿ“… 2025-06-29
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Large-scale parallel simulations on supercomputers are often bottlenecked by inter-process communication latency (PCL), yet existing diagnostic methods require administrator-level access to physical network topologyโ€”making them inaccessible to ordinary users. To address this, we propose PCLVis, the first user-oriented visual analytics framework for PCL diagnosis. It automatically constructs a process correlation tree and a communication-dependency DAG from MPI logs, and introduces two novel abstractions: a sliding-window algorithm for temporal segmentation and Communication-State Glyphs (CS-Glyphs) for encoding latency states. These enable spatiotemporal localization of PCL events, propagation-path tracing, and root-cause attribution. Furthermore, clustering and graph-based modeling support latency-pattern recognition and interactive exploration. Evaluated on multiple real-world simulation workloads on the TH-1A supercomputer, PCLVis significantly improves PCL diagnosis efficiency and delivers a deployable, user-accessible tool for performance optimization of large-scale simulations.

Technology Category

Application Category

๐Ÿ“ Abstract
Large-scale simulations on supercomputers have become important tools for users. However, their scalability remains a problem due to the huge communication cost among parallel processes. Most of the existing communication latency analysis methods rely on the physical link layer information, which is only available to administrators. In this paper, a framework called PCLVis is proposed to help general users analyze process communication latency (PCL) events. Instead of the physical link layer information, the PCLVis uses the MPI process communication data for the analysis. First, a spatial PCL event locating method is developed. All processes with high correlation are classified into a single cluster by constructing a process-correlation tree. Second, the propagation path of PCL events is analyzed by constructing a communication-dependency-based directed acyclic graph (DAG), which can help users interactively explore a PCL event from the temporal evolution of a located PCL event cluster. In this graph, a sliding window algorithm is designed to generate the PCL events abstraction. Meanwhile, a new glyph called the communication state glyph (CS-Glyph) is designed for each process to show its communication states, including its in/out messages and load balance. Each leaf node can be further unfolded to view additional information. Third, a PCL event attribution strategy is formulated to help users optimize their simulations. The effectiveness of the PCLVis framework is demonstrated by analyzing the PCL events of several simulations running on the TH-1A supercomputer. By using the proposed framework, users can greatly improve the efficiency of their simulations.
Problem

Research questions and friction points this paper is trying to address.

Analyzes process communication latency in large-scale simulations
Identifies high-correlation processes using MPI data
Visualizes communication paths and states for optimization
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses MPI process communication data for analysis
Develops process-correlation tree for event clustering
Designs communication-dependency-based DAG for path analysis
๐Ÿ”Ž Similar Papers
No similar papers found.
Chongke Bi
Chongke Bi
Professor of Tianjin University
VisualizationBig data
X
Xin Gao
College of Intelligence and Computing, Tianjin University, Tianjin, 300072, China
B
Baofeng Fu
College of Intelligence and Computing, Tianjin University, Tianjin, 300072, China
Yuheng Zhao
Yuheng Zhao
Fudan University
Data VisualizationVisual AnalyticsHuman-AI Collaboration
S
Siming Chen
School of Data Science, Fudan University, Shanghai, 200433, China
Y
Ying Zhao
School of Computer Science and Engineering, Central South University, Changsha, 410083, China
Yunhai Wang
Yunhai Wang
Renmin University of China
Data VisualizationHuman Data Interaction