🤖 AI Summary
Existing programming models struggle to simultaneously achieve high performance, cross-platform portability, and developer productivity—scientific computing frameworks prioritize the former two, while AI frameworks emphasize the latter two. Cross-domain integration (e.g., between scientific computing and AI) relies on error-prone manual adaptation, and framework extensibility remains limited, hindering rapid support for emerging hardware.
Method: We propose a unified MLIR-based compilation framework that integrates Kokkos’s scalability principles with the PyTorch ecosystem, featuring a domain-specific language (DSL) for sparse and dense linear algebra kernels. It enables automatic lowering and high-performance, multi-backend code generation.
Contribution/Results: Our framework matches or exceeds native MLIR performance across CPU and GPU architectures; achieves, for the first time, seamless interoperability between PyTorch and Kokkos; and drastically accelerates new-architecture adoption. It establishes a foundational programming infrastructure that jointly delivers performance, portability, and productivity for converged scientific computing and artificial intelligence workloads.
📝 Abstract
Portability, performance, and productivity are three critical dimensions for evaluating a programming model or compiler infrastructure. Several modern programming models for computational science focus on performance and portability. On the other end, several machine learning focused programming models focus on portability and productivity. A clear solution that is strong in all three dimensions has yet to emerge. A second related problem arises when use cases from computational science converge with machine learning. The disparate popular frameworks of these fields require programmers to manually integrate codes written in different frameworks. Finally, several programming frameworks lack easy options for extensibility as any new computer architecture change require complex changes to the programming models. We present LAPIS, an MLIR-based compiler that addresses all three of these challenges. We demonstrate that LAPIS can automatically lower sparse and dense linear algebra kernels from computational science and artificial intelligence use cases. We also show how LAPIS facilitates the integration of codes between PyTorch and Kokkos. We compare kernel performance with the default MLIR implementations on diverse architectures to demonstrate portability. By developing a dialect that is built on the principles of the Kokkos ecosystem, LAPIS also allows extensibility of the framework to new architectures.