An Efficient Plugin Method for Metric Optimization of Black-Box Models

📅 2025-03-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of adapting immutable black-box classifiers to target distributions without access to model internals, training data, or feature representations. We propose a lightweight, training-free, post-hoc method that optimizes arbitrary non-differentiable confusion matrix metrics—such as F1-score and Cohen’s Kappa—using only a small number of predicted class probabilities and corresponding ground-truth labels. Our approach integrates probabilistic calibration with gradient estimation of the confusion matrix, enabling end-to-end optimization via few-shot supervision and iterative black-box queries. The key contribution is the first query-driven framework for direct metric optimization in black-box multi-class classification, requiring no model retraining or architectural assumptions. Experiments across tabular and language tasks demonstrate performance competitive with state-of-the-art methods, with significant gains in F1-score and Precision@k, while maintaining inference latency below 0.1 seconds per thousand samples.

Technology Category

Application Category

📝 Abstract
Many machine learning algorithms and classifiers are available only via API queries as a ``black-box'' -- that is, the downstream user has no ability to change, re-train, or fine-tune the model on a particular target distribution. Indeed, the downstream user may not even have knowledge of the emph{original} training distribution or performance metric used to construct and optimize the black-box model. We propose a simple and efficient method, Plugin, which emph{post-processes} arbitrary multiclass predictions from any black-box classifier in order to simultaneously (1) adapt these predictions to a target distribution; and (2) optimize a particular metric of the confusion matrix. Importantly, Plugin is a completely extit{post-hoc} method which does not rely on feature information, only requires a small amount of probabilistic predictions along with their corresponding true label, and optimizes metrics by querying. We empirically demonstrate that Plugin is both broadly applicable and has performance competitive with related methods on a variety of tabular and language tasks.
Problem

Research questions and friction points this paper is trying to address.

Adapts black-box model predictions to target distributions
Optimizes confusion matrix metrics without retraining
Requires minimal data: probabilistic predictions and true labels
Innovation

Methods, ideas, or system contributions that make the work stand out.

Post-processes black-box classifier predictions
Optimizes metrics without feature information
Adapts predictions to target distribution