🤖 AI Summary
In GPU programming, tension between fine-grained per-thread control and coarse-grained collective operations (e.g., Tensor Core instructions) undermines modularity and safety: collective primitives require coordinated execution across thread groups, yet encapsulated functions are invoked by individual threads. Method: We introduce Prism, a new language featuring *typed views*—a novel type-system mechanism that explicitly classifies thread behavior by control granularity (per-thread, group-wide, or global), enabling safe, modular abstraction of collective operations. Built upon the Bundl core calculus, Prism’s type-safe compiler supports precise, hardware-aware abstractions for accelerators like Tensor Cores. Contribution/Results: Evaluation shows Prism delivers strong type safety with zero runtime overhead, significantly improving GPU kernel correctness, maintainability, and developer productivity—without sacrificing performance.
📝 Abstract
To achieve peak performance on modern GPUs, one must balance two frames of mind: issuing instructions to individual threads to control their behavior, while simultaneously tracking the convergence of many threads acting in concert to perform collective operations like Tensor Core instructions. The tension between these two mindsets makes modular programming error prone. Functions that encapsulate collective operations, despite being called per-thread, must be executed cooperatively by groups of threads.
In this work, we introduce Prism, a new GPU language that restores modularity while still giving programmers the low-level control over collective operations necessary for high performance. Our core idea is typed perspectives, which materialize, at the type level, the granularity at which the programmer is controlling the behavior of threads. We describe the design of Prism, implement a compiler for it, and lay its theoretical foundations in a core calculus called Bundl. We implement state-of-the-art GPU kernels in Prism and find that it offers programmers the safety guarantees needed to confidently write modular code without sacrificing performance.