🤖 AI Summary
Discrete-ordinates $S_N$ transport solvers on unstructured meshes suffer from poor scalability on shared-memory systems due to complex data dependencies, irregular memory access patterns, and high-dimensional computational domains.
Method: This paper proposes an asynchronous multi-task parallel algorithm that constructs a non-blocking execution model based on a task dependency graph, eliminating traditional synchronization barriers to enable fine-grained task scheduling and dynamic load balancing. The approach integrates shared-memory programming with hardware-aware optimizations.
Contribution/Results: Evaluated across multiple many-core platforms—including Intel Xeon Phi and AMD EPYC—the algorithm achieves 1.8–3.2× higher speedup over baseline methods on configurations with 64+ cores, with strong scaling efficiency exceeding 75%. It significantly improves resource utilization and throughput for high-dimensional transport problems.
📝 Abstract
Discrete ordinates $S_N$ transport solvers on unstructured meshes pose a challenge to scale due to complex data dependencies, memory access patterns and a high-dimensional domain. In this paper, we review the performance bottlenecks within the shared memory parallelization scheme of an existing transport solver on modern many-core architectures with high core counts. With this analysis, we then survey the performance of this solver across a variety of compute hardware. We then present a new Asynchronous Many-Task (AMT) algorithm for shared memory parallelism, present results showing an increase in computational performance over the existing method, and evaluate why performance is improved.