🤖 AI Summary
This work addresses the high computational cost and weak physical interpretability of attention mechanisms and first-order state-space models. We propose Wave-PDE Nets—the first neural architecture that employs a differentiable second-order wave partial differential equation (PDE) as its fundamental layer. Hidden states propagate through a continuous medium governed by learnable spatially varying wave speed and damping fields, (c(x)) and (gamma(x)), enabling global oscillatory dynamics as an alternative to explicit long-range dependency modeling. We theoretically establish that a single Wave-PDE layer possesses universal approximation capability. An FFT-based symplectic spectral solver ensures efficient (O(n log n)) inference. On language and vision benchmarks, Wave-PDE Nets match or surpass Transformer performance while reducing measured inference latency by 30%, peak memory usage by 25%, and improving training stability—achieving both strong physics-informed inductive bias and superior computational efficiency.
📝 Abstract
We introduce Wave-PDE Nets, a neural architecture whose elementary operation is a differentiable simulation of the second-order wave equation. Each layer propagates its hidden state as a continuous field through a medium with trainable spatial velocity c(x) and damping γ(x). A symplectic spectral solver based on FFTs realises this propagation in O(nlog n) time. This oscillatory, global mechanism provides a powerful alternative to attention and first-order state-space models. We prove that a single Wave-PDE layer is a universal approximator. On language and vision benchmarks, Wave-PDE Nets match or exceed Transformer performance while demonstrating superior practical efficiency, reducing wall-clock time by up to 30% and peak memory by 25%. Ablation studies confirm the critical role of symplectic integration and a spectral Laplacian for stability and performance. Visualizations of the learned physical parameters reveal that the model learns intuitive strategies for information propagation. These results position Wave-PDE Nets as a computationally efficient and robust architecture with a strong physical inductive bias.