🤖 AI Summary
This work addresses the longstanding challenge in GPU programming of simultaneously achieving high productivity, portability, and performance: low-level models offer high performance but suffer from poor developer productivity, while high-level abstractions improve productivity at the cost of performance. To bridge this gap, the paper proposes nomp, a novel framework that integrates a pragma-based programming model with a metadata-driven runtime system to enable domain-specific code transformation and generation. By incorporating domain-specific optimization patterns, nomp significantly enhances programming productivity while preserving high performance and cross-platform portability, effectively reconciling the trade-offs between low-level efficiency and high-level abstraction in GPU programming.
📝 Abstract
The low-level GPU programming models (CUDA, HIP, OpenCL, etc.) provide detailed control of the data flow and execution plan of a program in order to extract close-to-metal performance. However, these have a steep learning curve due to the intricacies of their syntax and semantics. This reduces programmer productivity. On the other hand, high-level models (OpenMP, OpenACC, etc.) that serve as abstractions over the low-level models are aimed at improving programmer productivity but achieving performance on-par with the low-level models is a challenge. There are inherent trade-offs between productivity, portability and performance in both approaches and there is no one-size-fits-all solution which achieves all three simultaneously. However, we believe there is room to improve programmer productivity without sacrificing performance and portability by reusing optimization patterns specific to a given domain. To this end, we propose nomp: a framework for building domain specific compilers. nomp consists of a pragma based programming model and a runtime capable of code transformation and generation based on user provided metadata.