🤖 AI Summary
Existing LLM-driven PDE solver generation methods rely heavily on numerical evaluation, incurring prohibitive computational costs. Method: This paper proposes an “Analysis–Generation–Synthesis” three-stage framework that integrates large language model reasoning with mathematical analysis—including solution-type identification, stability criteria, and mathematical chain-of-thought reasoning—to enable efficient, high-accuracy solver construction. Crucially, it introduces an iterative feedback-driven collaborative selection-and-hybridization mechanism that avoids redundant numerical verification during generation. Contribution/Results: Experiments demonstrate that the method achieves comparable or superior accuracy with only ~13 solver evaluations on average—reducing computational overhead by 60–75% over baselines—while improving solution accuracy by approximately 4×. Moreover, it exhibits strong robustness and generalization across diverse LLM architectures.
📝 Abstract
Current LLM-driven approaches using test-time computing to generate PDE solvers execute a large number of solver samples to identify high-accuracy solvers. These paradigms are especially costly for complex PDEs requiring substantial computational resources for numerical evaluation. We introduce PDE-SHARP, a framework to reduce computational costs by replacing expensive scientific computation by cheaper LLM inference that achieves superior solver accuracy with 60-75% fewer computational evaluations. PDE-SHARP employs three stages: (1) Analysis: mathematical chain-of-thought analysis including PDE classification, solution type detection, and stability analysis; (2) Genesis: solver generation based on mathematical insights from the previous stage; and (3) Synthesis: collaborative selection-hybridization tournaments in which LLM judges iteratively refine implementations through flexible performance feedback. To generate high-quality solvers, PDE-SHARP requires fewer than 13 solver evaluations on average compared to 30+ for baseline methods, improving accuracy uniformly across tested PDEs by $4 imes$ on average, and demonstrates robust performance across LLM architectures, from general-purpose to specialized reasoning models.