The Pitfalls of Benchmarking in Algorithm Selection: What We Are Getting Wrong

📅 2025-05-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper identifies systemic flaws in benchmark evaluation for algorithm selection—particularly in black-box optimization. First, leave-instance-out cross-validation induces data leakage, yielding spuriously high accuracy. Second, scale-sensitive performance metrics (e.g., raw function-value error) introduce substantial optimistic bias. Through empirical analysis, ablation studies, and meta-model diagnostics, the authors quantitatively demonstrate that non-informative features can achieve >90% accuracy under flawed evaluation protocols, and unnormalized objective functions overestimate meta-model performance by over 40%. To address these issues, they propose a decoupled evaluation framework that separately assesses *feature effectiveness* and *metric-induced interference*, emphasizing objective-function normalization and error attribution analysis. This work advances algorithm selection evaluation toward standardization, robustness, and reproducibility.

Technology Category

Application Category

📝 Abstract
Algorithm selection, aiming to identify the best algorithm for a given problem, plays a pivotal role in continuous black-box optimization. A common approach involves representing optimization functions using a set of features, which are then used to train a machine learning meta-model for selecting suitable algorithms. Various approaches have demonstrated the effectiveness of these algorithm selection meta-models. However, not all evaluation approaches are equally valid for assessing the performance of meta-models. We highlight methodological issues that frequently occur in the community and should be addressed when evaluating algorithm selection approaches. First, we identify flaws with the"leave-instance-out"evaluation technique. We show that non-informative features and meta-models can achieve high accuracy, which should not be the case with a well-designed evaluation framework. Second, we demonstrate that measuring the performance of optimization algorithms with metrics sensitive to the scale of the objective function requires careful consideration of how this impacts the construction of the meta-model, its predictions, and the model's error. Such metrics can falsely present overly optimistic performance assessments of the meta-models. This paper emphasizes the importance of careful evaluation, as loosely defined methodologies can mislead researchers, divert efforts, and introduce noise into the field
Problem

Research questions and friction points this paper is trying to address.

Identifying flaws in 'leave-instance-out' evaluation technique for algorithm selection
Addressing scale-sensitive metrics' impact on meta-model performance assessment
Highlighting methodological issues in benchmarking algorithm selection approaches
Innovation

Methods, ideas, or system contributions that make the work stand out.

Exposes flaws in leave-instance-out evaluation technique
Highlights issues with non-informative features and meta-models
Warns about misleading metrics in performance assessment
🔎 Similar Papers
No similar papers found.
G
Gasper Petelin
Computer Systems Department, Jožef Stefan Institute, Jožef Stefan International Postgraduate School, Ljubljana, Slovenia
Gjorgjina Cenikj
Gjorgjina Cenikj
Young Researcher, Jožef Stefan Institute
machine learningdeep learningoptimizationautomated machine learningNLP