SparStencil: Retargeting Sparse Tensor Cores to Scientific Stencil Computations via Structured Sparsity Transformation

📅 2025-06-28

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Stencil computations in scientific computing exhibit irregular sparse access patterns that are inherently incompatible with GPU sparse tensor cores—such as 2:4 structured-sparse hardware—which require regular, hardware-aligned sparsity. This work pioneers the integration of sparse tensor cores into stencil computation. We propose a holistic optimization framework comprising adaptive layout deformation, structured-sparsity conversion, and automatic kernel generation. Our approach employs a *flatten-and-crush* pipeline, graph-matching–based modeling, layout search, and table-driven memory mapping to efficiently transform irregular stencil sparsity into hardware-friendly structured formats. Evaluated on 79 stencil kernels, our method achieves an average speedup of 3.1× (up to 7.1×) over baseline dense implementations, significantly reducing development complexity while matching or surpassing expert hand-tuned performance.

Technology Category

Application Category

📝 Abstract

Sparse Tensor Cores offer exceptional performance gains for AI workloads by exploiting structured 2:4 sparsity. However, their potential remains untapped for core scientific workloads such as stencil computations, which exhibit irregular sparsity patterns.This paper presents SparStencil, the first system to retarget sparse TCUs for scientific stencil computations through structured sparsity transformation. SparStencil introduces three key techniques: (1) Adaptive Layout Morphing, which restructures stencil patterns into staircase-aligned sparse matrices via a flatten-and-crush pipeline; (2) Structured Sparsity Conversion, which formulates transformation as a graph matching problem to ensure compatibility with 2:4 sparsity constraints; (3) Automatic Kernel Generation, which compiles transformed stencils into optimized sparse MMA kernels via layout search and table-driven memory mapping. Evaluated on 79 stencil kernels spanning diverse scientific domains, SparStencil achieves up to 7.1x speedup (3.1x on average) over state-of-the-art framework while reducing code complexity and matching or exceeding expert-tuned performance in both compute throughput and memory efficiency.

Problem

Research questions and friction points this paper is trying to address.

Adapting sparse tensor cores for scientific stencil computations

Transforming irregular sparsity to structured 2:4 sparsity patterns

Optimizing performance and memory efficiency in stencil kernels

Innovation

Methods, ideas, or system contributions that make the work stand out.

Adaptive Layout Morphing for stencil restructuring

Structured Sparsity Conversion via graph matching

Automatic Kernel Generation with layout search

🔎 Similar Papers

No similar papers found.

Authors to Follow