MeshTok: Efficient Multi-Scale Tokenization for Scalable PDE Transformers

📅 2026-06-02
📈 Citations: 0
Influential: 0
📄 PDF

career value

181K/year
🤖 AI Summary
This work addresses the limitation of conventional uniform-patch-based Transformers in solving partial differential equations (PDEs), which struggle to adaptively allocate computational resources, thereby compromising accuracy and efficiency. To overcome this, the authors propose MeshTok, a novel framework that, for the first time, integrates adaptive mesh refinement (AMR) principles into Transformer tokenization. MeshTok generates heterogeneous, multi-scale tokens tailored to regions with steep gradients or multi-scale features, enabling joint modeling of global context and local details within a unified sequence. Notably, this approach achieves adaptive, efficient PDE representations without requiring customized network architectures. Extensive experiments demonstrate that MeshTok significantly outperforms uniform-grid baselines across multiple PDE families and benchmark datasets, achieving a superior trade-off between computational efficiency and prediction accuracy.
📝 Abstract
Conventional patchified Transformers operate on uniform spatial partitions, distributing computational effort evenly across the domain irrespective of local features. This inflexible tokenization scheme is inherently limited in its ability to efficiently represent and process solutions to complex PDEs. To address this, we propose MeshTok, an adaptive mesh refinement (AMR)-inspired tokenization and sequence modeling framework. This method selectively refines spatial regions exhibiting sharp gradients, transient features, or multiscale structures, generating a heterogeneous set of multiscale tokens defined on a fixed simulation grid. These tokens are processed within a unified Transformer sequence, enabling the model to simultaneously capture coarse-grained global context and fine-grained local details without requiring specialized architectural components. Although adaptive refinement moderately increases token count, it promotes a more targeted allocation of computational resources to physically informative regions, which we view as a practical inductive bias rather than a formal optimality guarantee. Experimental evaluations across multiple PDE families and benchmark datasets demonstrate that MeshTok consistently improves the efficiency-accuracy trade-off compared to uniform-grid baselines. This suggests adaptive multiscale tokenization as a scalable and generalizable design principle for neural PDE modeling. Code is available at https://github.com/SCAILab-USTC/MeshTok.
Problem

Research questions and friction points this paper is trying to address.

PDE
tokenization
adaptive mesh refinement
multiscale representation
computational efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

adaptive tokenization
multiscale representation
PDE transformers
mesh refinement
computational efficiency
🔎 Similar Papers
2024-06-27Conference on Empirical Methods in Natural Language ProcessingCitations: 2
Y
Yanshun Zhao
School of Mathematical Sciences, University of Science and Technology of China, Hefei 230026, China
X
Xiaoyu Peng
School of Mathematical Sciences, University of Science and Technology of China, Hefei 230026, China
J
Jiamin Jiang
Suzhou Institute for Advanced Research, University of Science and Technology of China, Suzhou 215123, China
Congcong Zhu
Congcong Zhu
USTC
Multimedia Understanding
J
Jingrun Chen
School of Mathematical Sciences, University of Science and Technology of China, Hefei 230026, China; Suzhou Institute for Advanced Research, University of Science and Technology of China, Suzhou 215123, China