Linear Layouts: Robust Code Generation of Efficient Tensor Computation Using $mathbb{F}_2$

๐Ÿ“… 2025-05-28
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Existing tensor layout design methodologies struggle to balance flexibility and performance, limiting their adaptability to complex deep learning algorithms and heterogeneous hardware. This paper introduces the first unified modeling framework grounded in $mathbb{F}_2$ linear algebra, formally representing layouts as binary linear transformation matrices acting on hardware address bits. Our approach enables verifiable, constant-time conversions between arbitrary layoutsโ€”marking the first such capability, and overcoming the limitations of case-by-case layout definitions and $O(n^2)$ conversion schemes prevalent in prior work. Integrated end-to-end into the Triton compiler, it supports bit-level address mapping analysis and automated layout optimization. Experimental evaluation demonstrates significant performance improvements across multiple operators, simplifies backend development, and resolves several longstanding defects in legacy layout systems.

Technology Category

Application Category

๐Ÿ“ Abstract
Efficient tensor computation is a cornerstone of modern deep learning (DL) workloads, yet existing approaches struggle to achieve flexible and performant design and implementation of tensor layouts -- mappings between logical tensors and hardware resources. The increasing complexity of DL algorithms and hardware demands a generic and systematic approach to handling tensor layouts. In this work, we introduce Linear Layouts, a novel approach that models tensor layouts using linear algebra over $mathbb{F}_2$. By representing tensor layouts as binary matrices acting on the bits of the hardware representation, our approach enables a generic layout definition -- as opposed to the classical case-by-case approach -- and allows for generic layout-to-layout conversions, eliminating the quadratic explosion that plagues existing solutions. We integrate linear layouts with Triton and demonstrate their effectiveness in optimizing individual Triton operators as well as kernels written in Triton. We also show that linear layouts reduce engineering effort in the compiler backend while fixing several bugs in Triton's legacy layout system.
Problem

Research questions and friction points this paper is trying to address.

Achieving flexible and performant tensor layout design
Handling increasing complexity of DL algorithms and hardware
Eliminating quadratic explosion in layout-to-layout conversions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Models tensor layouts using linear algebra over F2
Represents layouts as binary matrices for flexibility
Integrates with Triton for optimized computation
๐Ÿ”Ž Similar Papers
No similar papers found.
Keren Zhou
Keren Zhou
George Mason University
Concurrent ProgrammingDistributed SystemParallel ProgrammingMachine Learning
M
Mario Lezcano
OpenAI
A
Adam P. Goucher
OpenAI
A
Akhmed Rakhmati
OpenAI
J
Jeff Niu
OpenAI
J
Justin Lebar
OpenAI
P
Pawel Szczerbuk
OpenAI
P
Peter Bell
OpenAI
P
Phil Tillet
OpenAI
T
Thomas Raoux
OpenAI
Zahi Moudallal
Zahi Moudallal
OpenAI