Toward Compiler World Models: Learning Latent Dynamics for Efficient Tensor Program Search

๐Ÿ“… 2026-06-08
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work addresses the limitations of conventional auto-schedulers, which treat candidate schedules as static code snapshots and ignore dependencies among scheduling actions, leading to sensitivity to superficial changes and inefficient search. To overcome this, the paper introducesโ€” for the first timeโ€”a world model into compiler optimization, proposing an action-conditional latent dynamics model that represents the scheduling process as an evolution of program states in a continuous latent space, thereby avoiding repeated AST modifications and re-encodings. Integrated into TVM AutoScheduler, the approach combines a lightweight transition network with a hardware-aware ranking mechanism. Experiments show that, under the same 64 measurements, it outperforms Ansor by 1.37ร— on GPU and 1.54ร— on CPU; with only one-tenth the measurement budget, it achieves 97.8% of Ansor-10Kโ€™s performance; and full-model inference runs 4.61ร— and 3.67ร— faster than PyTorch and PyTorch-opt, respectively.
๐Ÿ“ Abstract
Tensor program optimization is essential for modern machine learning systems, but its search space is enormous. Existing auto-schedulers reduce measurement cost with learned cost models, yet they usually evaluate each candidate as a static code snapshot, ignoring the schedule trajectory that produced it. This makes them insensitive to action dependencies and vulnerable to superficial code variations. We propose a \emph{world-model-inspired} evaluator that models schedule evaluation as action-conditioned latent dynamics over program states. Starting from the initial program, it rolls out scheduling actions in a continuous latent space with a lightweight transition model, avoiding expensive AST mutation and repeated code encoding. The final dynamic representation is combined with action and hardware features to rank candidates. Implemented in TVM AutoScheduler, our method improves representative-subgraph latency over Ansor by 1.37$\times$ on GPU and 1.54$\times$ on CPU under the same 64-trial budget. It also matches Ansor-10K within 2.2% geometric mean using 10$\times$ fewer measurements, and accelerates full-model inference over PyTorch/PyTorch-opt(cuDNN) by 4.61$\times$/3.67$\times$ geometric mean.
Problem

Research questions and friction points this paper is trying to address.

tensor program optimization
auto-scheduling
schedule trajectory
cost modeling
compiler
Innovation

Methods, ideas, or system contributions that make the work stand out.

world model
latent dynamics
tensor program optimization
auto-scheduling
program state representation
Haolin Pan
Haolin Pan
Institute of Software Chinese Academy of Sciences
AI for CompilerSIMD OptimizationCompiler Technology
L
Lianghong Huang
Institute of Software, Chinese Academy of Sciences, Beijing, China; University of Chinese Academy of Sciences, Beijing, China
X
Xulin Zhou
Institute of Software, Chinese Academy of Sciences, Beijing, China; University of Chinese Academy of Sciences, Beijing, China
M
Mingjie Xing
Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou, China; Institute of Software, Chinese Academy of Sciences, Beijing, China; University of Chinese Academy of Sciences, Beijing, China
Yanjun Wu
Yanjun Wu
Institute of Software, Chinese Academy of Sciences
Computer Science