🤖 AI Summary
To address the high cost of model selection in machine learning—stemming from reliance on large labeled validation sets—this paper proposes a consensus-driven active model selection method. Instead of assuming a fixed validation set, the approach models probabilistic relationships among classifiers, classes, and samples using candidate models’ predictions, and employs Bayesian inference to quantify inter-model agreement and disagreement. It then dynamically identifies the most discriminative samples for prioritized labeling. The key contribution lies in reframing model selection as a feedback-guided probabilistic inference problem, thereby jointly optimizing labeling efficiency and selection accuracy. Experiments across 26 benchmark tasks demonstrate that the method reliably identifies the optimal model using over 70% fewer labels than state-of-the-art alternatives.
📝 Abstract
The widespread availability of off-the-shelf machine learning models poses a challenge: which model, of the many available candidates, should be chosen for a given data analysis task? This question of model selection is traditionally answered by collecting and annotating a validation dataset -- a costly and time-intensive process. We propose a method for active model selection, using predictions from candidate models to prioritize the labeling of test data points that efficiently differentiate the best candidate. Our method, CODA, performs consensus-driven active model selection by modeling relationships between classifiers, categories, and data points within a probabilistic framework. The framework uses the consensus and disagreement between models in the candidate pool to guide the label acquisition process, and Bayesian inference to update beliefs about which model is best as more information is collected. We validate our approach by curating a collection of 26 benchmark tasks capturing a range of model selection scenarios. CODA outperforms existing methods for active model selection significantly, reducing the annotation effort required to discover the best model by upwards of 70% compared to the previous state-of-the-art. Code and data are available at https://github.com/justinkay/coda.