The Pitfalls of Benchmarking in Algorithm Selection: What We Are Getting Wrong

📅 2025-05-12

📈 Citations: 0

✨ Influential: 0

career value

191K/year

🤖 AI Summary

This paper identifies systemic flaws in benchmark evaluation for algorithm selection—particularly in black-box optimization. First, leave-instance-out cross-validation induces data leakage, yielding spuriously high accuracy. Second, scale-sensitive performance metrics (e.g., raw function-value error) introduce substantial optimistic bias. Through empirical analysis, ablation studies, and meta-model diagnostics, the authors quantitatively demonstrate that non-informative features can achieve >90% accuracy under flawed evaluation protocols, and unnormalized objective functions overestimate meta-model performance by over 40%. To address these issues, they propose a decoupled evaluation framework that separately assesses *feature effectiveness* and *metric-induced interference*, emphasizing objective-function normalization and error attribution analysis. This work advances algorithm selection evaluation toward standardization, robustness, and reproducibility.

Technology Category

Application Category

📝 Abstract

Algorithm selection, aiming to identify the best algorithm for a given problem, plays a pivotal role in continuous black-box optimization. A common approach involves representing optimization functions using a set of features, which are then used to train a machine learning meta-model for selecting suitable algorithms. Various approaches have demonstrated the effectiveness of these algorithm selection meta-models. However, not all evaluation approaches are equally valid for assessing the performance of meta-models. We highlight methodological issues that frequently occur in the community and should be addressed when evaluating algorithm selection approaches. First, we identify flaws with the"leave-instance-out"evaluation technique. We show that non-informative features and meta-models can achieve high accuracy, which should not be the case with a well-designed evaluation framework. Second, we demonstrate that measuring the performance of optimization algorithms with metrics sensitive to the scale of the objective function requires careful consideration of how this impacts the construction of the meta-model, its predictions, and the model's error. Such metrics can falsely present overly optimistic performance assessments of the meta-models. This paper emphasizes the importance of careful evaluation, as loosely defined methodologies can mislead researchers, divert efforts, and introduce noise into the field

Problem

Research questions and friction points this paper is trying to address.

Identifying flaws in 'leave-instance-out' evaluation technique for algorithm selection

Addressing scale-sensitive metrics' impact on meta-model performance assessment

Highlighting methodological issues in benchmarking algorithm selection approaches

Innovation

Methods, ideas, or system contributions that make the work stand out.

Exposes flaws in leave-instance-out evaluation technique

Highlights issues with non-informative features and meta-models

Warns about misleading metrics in performance assessment

🔎 Similar Papers

No similar papers found.