🤖 AI Summary
This work addresses the limitation of conventional uniform-patch-based Transformers in solving partial differential equations (PDEs), which struggle to adaptively allocate computational resources, thereby compromising accuracy and efficiency. To overcome this, the authors propose MeshTok, a novel framework that, for the first time, integrates adaptive mesh refinement (AMR) principles into Transformer tokenization. MeshTok generates heterogeneous, multi-scale tokens tailored to regions with steep gradients or multi-scale features, enabling joint modeling of global context and local details within a unified sequence. Notably, this approach achieves adaptive, efficient PDE representations without requiring customized network architectures. Extensive experiments demonstrate that MeshTok significantly outperforms uniform-grid baselines across multiple PDE families and benchmark datasets, achieving a superior trade-off between computational efficiency and prediction accuracy.
📝 Abstract
Conventional patchified Transformers operate on uniform spatial partitions, distributing computational effort evenly across the domain irrespective of local features. This inflexible tokenization scheme is inherently limited in its ability to efficiently represent and process solutions to complex PDEs. To address this, we propose MeshTok, an adaptive mesh refinement (AMR)-inspired tokenization and sequence modeling framework. This method selectively refines spatial regions exhibiting sharp gradients, transient features, or multiscale structures, generating a heterogeneous set of multiscale tokens defined on a fixed simulation grid. These tokens are processed within a unified Transformer sequence, enabling the model to simultaneously capture coarse-grained global context and fine-grained local details without requiring specialized architectural components. Although adaptive refinement moderately increases token count, it promotes a more targeted allocation of computational resources to physically informative regions, which we view as a practical inductive bias rather than a formal optimality guarantee. Experimental evaluations across multiple PDE families and benchmark datasets demonstrate that MeshTok consistently improves the efficiency-accuracy trade-off compared to uniform-grid baselines. This suggests adaptive multiscale tokenization as a scalable and generalizable design principle for neural PDE modeling. Code is available at https://github.com/SCAILab-USTC/MeshTok.