An Asynchronous Many-Task Algorithm for Unstructured $S_{N}$ Transport on Shared Memory Systems

📅 2025-10-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Discrete-ordinates $S_N$ transport solvers on unstructured meshes suffer from poor scalability on shared-memory systems due to complex data dependencies, irregular memory access patterns, and high-dimensional computational domains. Method: This paper proposes an asynchronous multi-task parallel algorithm that constructs a non-blocking execution model based on a task dependency graph, eliminating traditional synchronization barriers to enable fine-grained task scheduling and dynamic load balancing. The approach integrates shared-memory programming with hardware-aware optimizations. Contribution/Results: Evaluated across multiple many-core platforms—including Intel Xeon Phi and AMD EPYC—the algorithm achieves 1.8–3.2× higher speedup over baseline methods on configurations with 64+ cores, with strong scaling efficiency exceeding 75%. It significantly improves resource utilization and throughput for high-dimensional transport problems.

Technology Category

Application Category

📝 Abstract
Discrete ordinates $S_N$ transport solvers on unstructured meshes pose a challenge to scale due to complex data dependencies, memory access patterns and a high-dimensional domain. In this paper, we review the performance bottlenecks within the shared memory parallelization scheme of an existing transport solver on modern many-core architectures with high core counts. With this analysis, we then survey the performance of this solver across a variety of compute hardware. We then present a new Asynchronous Many-Task (AMT) algorithm for shared memory parallelism, present results showing an increase in computational performance over the existing method, and evaluate why performance is improved.
Problem

Research questions and friction points this paper is trying to address.

Addressing scalability challenges in unstructured SN transport solvers
Analyzing performance bottlenecks on modern many-core architectures
Developing asynchronous task algorithm for improved computational efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

AMT algorithm enables shared memory parallelism
Asynchronous many-task approach overcomes performance bottlenecks
New method increases computational performance on many-core systems
🔎 Similar Papers
No similar papers found.
A
Alex Elwood
High Performance Computing Research Group, School of Computer Science, University of Bristol, Bristol, UK
Tom Deakin
Tom Deakin
University of Bristol
high performance computingperformance portabilitygpu computingcomputer science
J
Justin Lovegrove
Computational Physics Group, AWE, Aldermaston, UK
Chris Nelson
Chris Nelson
High Performance Computing Group, AWE, Aldermaston, UK