ShapeBench: A Scalable Benchmark and Diagnostic Suite for Standardized Evaluation in Aerodynamic Shape Optimization

📅 2026-05-20

📈 Citations: 0

✨ Influential: 0

career value

186K/year

🤖 AI Summary

Existing aerodynamic shape optimization methods lack a unified and scalable evaluation framework, making fair comparisons across diverse tasks challenging. To address this gap, this work introduces ShapeBench, an open-source benchmark encompassing eight shape categories and 103 distinct tasks, which provides a standardized API, surrogate models for acceleration, high-fidelity CFD validation, and fixed computational budgets. ShapeBench enables, for the first time, systematic cross-shape and cross-objective evaluation of optimization algorithms and includes ShapeEvolve, a domain-specific evolutionary large language model as a new baseline. Experimental results reveal that optimizer performance exhibits extremely limited generalization across tasks (average Spearman ρ = 0.013), demonstrating that current methods are far from universal and underscoring the urgent need for more robust and generalizable optimization strategies.

📝 Abstract

Rapid progress in aerodynamic shape optimization (ASO) has outpaced currently-available standardized evaluation frameworks. Fair comparison requires a unified benchmark spanning diverse shape classes, objective formulations, and matched-budget state-of-the-art baselines. We introduce ShapeBench, an open-source ASO benchmark with a unified API spanning 103 tasks across eight shape categories and multiple optimization regimes. Each ShapeBench task includes a validated surrogate for fast search; when feasible, a high-fidelity Computational Fluid Dynamics (CFD) pipeline for final verification is available, enabling systematic fidelity-gap analysis. ShapeBench provides a reproducible protocol with well-configured baselines to compare fairly using a consistent budget metric, allowing for comparison among both classical and LLM-driven methods, including general-purpose optimizers and a new domain-specialized evolutionary LLM baseline, ShapeEvolve. Results on ShapeBench demonstrate substantial variance in optimizer rankings across shape categories and problem formulations, with mean pairwise Spearman $ρ= 0.013$, so single-task conclusions do not reliably generalize across problem classes. The benchmark is also far from saturation; classical methods are rarely applicable across all shape categories and tasks, further highlighting the need for more general-purpose approaches.

Problem

Research questions and friction points this paper is trying to address.

aerodynamic shape optimization

benchmark

standardized evaluation

optimization comparison

generalization

Innovation

Methods, ideas, or system contributions that make the work stand out.

Aerodynamic Shape Optimization

Benchmarking

Surrogate Modeling