π€ AI Summary
This work proposes a purely deformation-driven neural architecture for efficiently learning solution operators of time-dependent partial differential equations (PDEs), such as those governing fluid dynamics and wave propagation. Departing from conventional approaches that rely on Fourier multipliers, convolutions, or dot-product attention, the method introduces non-local interactions exclusively through multi-head displacement field prediction and sparse source coordinate sampling. This enables adaptive global modeling within multiscale residual blocks while maintaining linear computational complexity. Experimental results demonstrate that the 17M-parameter model outperforms mainstream baselines of comparable size across diverse 2D and 3D time-dependent PDE tasks, and its 150M-parameter variant surpasses even larger Transformer-based models, confirming the efficacy and scalability of deformation-based mechanisms for PDE solving.
π Abstract
We introduce Flowers, a neural architecture for learning PDE solution operators built entirely from multihead warps. Aside from pointwise channel mixing and a multiscale scaffold, Flowers use no Fourier multipliers, no dot-product attention, and no convolutional mixing. Each head predicts a displacement field and warps the mixed input features. Motivated by physics and computational efficiency, displacements are predicted pointwise, without any spatial aggregation, and nonlocality enters \emph{only} through sparse sampling at source coordinates, \emph{one} per head. Stacking warps in multiscale residual blocks yields Flowers, which implement adaptive, global interactions at linear cost. We theoretically motivate this design through three complementary lenses: flow maps for conservation laws, waves in inhomogeneous media, and a kinetic-theoretic continuum limit. Flowers achieve excellent performance on a broad suite of 2D and 3D time-dependent PDE benchmarks, particularly flows and waves. A compact 17M-parameter model consistently outperforms Fourier, convolution, and attention-based baselines of similar size, while a 150M-parameter variant improves over recent transformer-based foundation models with much more parameters, data, and training compute.