🤖 AI Summary
Training neural operators for high-resolution partial differential equation (PDE) operators incurs prohibitive computational cost, making it challenging to simultaneously achieve high accuracy and efficiency.
Method: This work introduces the multilevel Monte Carlo (MLMC) method into neural operator training for the first time. By constructing a hierarchical function discretization across multiple resolutions and incorporating a fine-grained gradient correction mechanism, the approach significantly reduces the number of required training samples and associated computational overhead for high-accuracy training. The method is architecture-agnostic—compatible with Fourier Neural Operator, DeepONet, Graph Neural Operator, and other mainstream frameworks—without requiring structural modifications.
Results: Extensive evaluation on state-of-the-art models and standard PDE benchmarks demonstrates up to 3.2× speedup in training time at equivalent accuracy. It establishes, for the first time, the Pareto-optimal trade-off curve between accuracy and training time in neural operator training, while maintaining scalability and generalizability.
📝 Abstract
Operator learning is a rapidly growing field that aims to approximate nonlinear operators related to partial differential equations (PDEs) using neural operators. These rely on discretization of input and output functions and are, usually, expensive to train for large-scale problems at high-resolution. Motivated by this, we present a Multi-Level Monte Carlo (MLMC) approach to train neural operators by leveraging a hierarchy of resolutions of function dicretization. Our framework relies on using gradient corrections from fewer samples of fine-resolution data to decrease the computational cost of training while maintaining a high level accuracy. The proposed MLMC training procedure can be applied to any architecture accepting multi-resolution data. Our numerical experiments on a range of state-of-the-art models and test-cases demonstrate improved computational efficiency compared to traditional single-resolution training approaches, and highlight the existence of a Pareto curve between accuracy and computational time, related to the number of samples per resolution.