🤖 AI Summary
This paper identifies systemic flaws in benchmark evaluation for algorithm selection—particularly in black-box optimization. First, leave-instance-out cross-validation induces data leakage, yielding spuriously high accuracy. Second, scale-sensitive performance metrics (e.g., raw function-value error) introduce substantial optimistic bias. Through empirical analysis, ablation studies, and meta-model diagnostics, the authors quantitatively demonstrate that non-informative features can achieve >90% accuracy under flawed evaluation protocols, and unnormalized objective functions overestimate meta-model performance by over 40%. To address these issues, they propose a decoupled evaluation framework that separately assesses *feature effectiveness* and *metric-induced interference*, emphasizing objective-function normalization and error attribution analysis. This work advances algorithm selection evaluation toward standardization, robustness, and reproducibility.
📝 Abstract
Algorithm selection, aiming to identify the best algorithm for a given problem, plays a pivotal role in continuous black-box optimization. A common approach involves representing optimization functions using a set of features, which are then used to train a machine learning meta-model for selecting suitable algorithms. Various approaches have demonstrated the effectiveness of these algorithm selection meta-models. However, not all evaluation approaches are equally valid for assessing the performance of meta-models. We highlight methodological issues that frequently occur in the community and should be addressed when evaluating algorithm selection approaches. First, we identify flaws with the"leave-instance-out"evaluation technique. We show that non-informative features and meta-models can achieve high accuracy, which should not be the case with a well-designed evaluation framework. Second, we demonstrate that measuring the performance of optimization algorithms with metrics sensitive to the scale of the objective function requires careful consideration of how this impacts the construction of the meta-model, its predictions, and the model's error. Such metrics can falsely present overly optimistic performance assessments of the meta-models. This paper emphasizes the importance of careful evaluation, as loosely defined methodologies can mislead researchers, divert efforts, and introduce noise into the field