Pre-Generating Multi-Difficulty PDE Data for Few-Shot Neural PDE Solvers

📅 2025-11-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Training neural PDE solvers suffers from prohibitively high computational cost in generating high-difficulty samples and poor generalization. Method: We propose a multi-difficulty pre-generated data strategy based on the 2D incompressible Navier–Stokes equations, systematically constructing a dataset spanning geometric complexity and Reynolds number gradients; training prioritizes low- and medium-difficulty samples to reduce reliance on high-difficulty ones. Contribution/Results: Our key insight is uncovering and leveraging the regulatory role of data difficulty distribution on model generalization. Experiments demonstrate that incorporating only a small fraction of high-difficulty samples achieves accuracy comparable to full high-difficulty training, while reducing total pre-generation computational cost by 8.9×. This approach simultaneously enhances data efficiency and generalization performance—particularly under few-shot settings—without compromising solution fidelity.

Technology Category

Application Category

📝 Abstract
A key aspect of learned partial differential equation (PDE) solvers is that the main cost often comes from generating training data with classical solvers rather than learning the model itself. Another is that there are clear axes of difficulty--e.g., more complex geometries and higher Reynolds numbers--along which problems become (1) harder for classical solvers and thus (2) more likely to benefit from neural speedups. Towards addressing this chicken-and-egg challenge, we study difficulty transfer on 2D incompressible Navier-Stokes, systematically varying task complexity along geometry (number and placement of obstacles), physics (Reynolds number), and their combination. Similar to how it is possible to spend compute to pre-train foundation models and improve their performance on downstream tasks, we find that by classically solving (analogously pre-generating) many low and medium difficulty examples and including them in the training set, it is possible to learn high-difficulty physics from far fewer samples. Furthermore, we show that by combining low and high difficulty data, we can spend 8.9x less compute on pre-generating a dataset to achieve the same error as using only high difficulty examples. Our results highlight that how we allocate classical-solver compute across difficulty levels is as important as how much we allocate overall, and suggest substantial gains from principled curation of pre-generated PDE data for neural solvers. Our code is available at https://github.com/Naman-Choudhary-AI-ML/pregenerating-pde
Problem

Research questions and friction points this paper is trying to address.

Pre-generating multi-difficulty PDE data reduces classical solver costs
Systematically varying geometry and physics improves neural solver efficiency
Combining low and high difficulty data cuts computational expense significantly
Innovation

Methods, ideas, or system contributions that make the work stand out.

Pre-generating multi-difficulty PDE data for training
Using low and medium difficulty examples to learn high-difficulty physics
Combining difficulty levels to reduce computational cost significantly
🔎 Similar Papers
No similar papers found.
N
Naman Choudhary
Machine Learning Department, Carnegie Mellon University
V
Vedant Singh
Machine Learning Department, Carnegie Mellon University
Ameet Talwalkar
Ameet Talwalkar
CMU, Datadog
Machine Learning
N
Nicholas Matthew Boffi
Machine Learning Department, Carnegie Mellon University
M
Mikhail Khodak
Department of Computer Sciences, UW-Madison
T
Tanya Marwah
Simons Foundation