Zero-shot generalization of transformer neural operators to larger domains

📅 2026-06-12

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the limited zero-shot generalization of existing Transformer-based neural operators to larger spatial domains, a limitation stemming from their assumption of a fixed solution domain. The authors propose an architecture-agnostic solution that introduces a factorizable attention logit bias to enforce fine-grained, controllable spatial locality while remaining compatible with efficient attention mechanisms. Coupled with rotation-based positional encoding, this approach explicitly models translation equivariance, thereby enabling seamless extension to arbitrarily large spatial regions. Evaluated on two canonical PDE benchmarks and a realistic 3D industrial atmospheric flow task, the method demonstrates significantly improved zero-shot generalization performance when extrapolating to larger domains.

📝 Abstract

Transformer-based neural operators have shown remarkable performance for approximating solution operators of partial differential equations on complex geometries. However, existing approaches implicitly assume a fixed domain size, which limits their ability to generalize at inference. In this work, we investigate domain extension, namely zero-shot inference on spatial domains that are significantly larger than those encountered during training. We argue that this setting fundamentally requires spatial locality and translation equivariance. We propose to implement this locality via a decomposable bias in the attention logits computation, enabling finely controllable locality while remaining fully decomposable into query-key inner products and directly compatible with optimized attention kernels. Combined with rotary positional embeddings, it enables expressive embeddings with controllable spatial support without altering the transformer architecture. We empirically show that our approach substantially improves zero-shot generalization to larger domains across two PDE benchmarks and a 3D industrial atmospheric flow application. Our code and datasets are available at https://github.com/cerea-daml/domain-extension.

Problem

Research questions and friction points this paper is trying to address.

zero-shot generalization

neural operators

domain extension

partial differential equations

spatial locality

Innovation

Methods, ideas, or system contributions that make the work stand out.

zero-shot generalization

neural operators

spatial locality