Zero-shot generalization of transformer neural operators to larger domains

📅 2026-06-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limited zero-shot generalization of existing Transformer-based neural operators to larger spatial domains, a limitation stemming from their assumption of a fixed solution domain. The authors propose an architecture-agnostic solution that introduces a factorizable attention logit bias to enforce fine-grained, controllable spatial locality while remaining compatible with efficient attention mechanisms. Coupled with rotation-based positional encoding, this approach explicitly models translation equivariance, thereby enabling seamless extension to arbitrarily large spatial regions. Evaluated on two canonical PDE benchmarks and a realistic 3D industrial atmospheric flow task, the method demonstrates significantly improved zero-shot generalization performance when extrapolating to larger domains.
📝 Abstract
Transformer-based neural operators have shown remarkable performance for approximating solution operators of partial differential equations on complex geometries. However, existing approaches implicitly assume a fixed domain size, which limits their ability to generalize at inference. In this work, we investigate domain extension, namely zero-shot inference on spatial domains that are significantly larger than those encountered during training. We argue that this setting fundamentally requires spatial locality and translation equivariance. We propose to implement this locality via a decomposable bias in the attention logits computation, enabling finely controllable locality while remaining fully decomposable into query-key inner products and directly compatible with optimized attention kernels. Combined with rotary positional embeddings, it enables expressive embeddings with controllable spatial support without altering the transformer architecture. We empirically show that our approach substantially improves zero-shot generalization to larger domains across two PDE benchmarks and a 3D industrial atmospheric flow application. Our code and datasets are available at https://github.com/cerea-daml/domain-extension.
Problem

Research questions and friction points this paper is trying to address.

zero-shot generalization
neural operators
domain extension
partial differential equations
spatial locality
Innovation

Methods, ideas, or system contributions that make the work stand out.

zero-shot generalization
neural operators
spatial locality
translation equivariance
decomposable attention bias
🔎 Similar Papers
No similar papers found.
A
Armand de Villeroché
CEREA, ENPC, EDF R&D, Institut Polytechnique de Paris, Île-de-France, France
Sibo Cheng
Sibo Cheng
Junior Professor, CEREA,ENPC, Institut Polytechnique de Paris
AI4scienceData assimilationMachine learningModel reductionscientific computing
V
Vincent Le Guen
SINCLAIR AI Laboratory, Saclay, Île-de-France, France
Marc Bocquet
Marc Bocquet
École nationale des ponts et chaussées, Institut Polytechnique de Paris, CEREA
Data assimilationinverse problemsmachine learning
R
Rem-Sophia Mouradi
EDF R&D, Île-de-France, France
P
Patrick Armand
CEA, DAM, DIF, F-91297 Arpajon, France
A
Alban Farchi
CEREA, ENPC, EDF R&D, Institut Polytechnique de Paris, Île-de-France, France
P
Patrick Massin
CEREA, ENPC, EDF R&D, Institut Polytechnique de Paris, Île-de-France, France