🤖 AI Summary
This work addresses the challenge of effectively transferring benchmark evaluation knowledge to novel few-shot tasks. It proposes a transfer learning and model selection aggregation method grounded in a weak monotonicity assumption: models that perform better across multiple source benchmarks tend to also perform better on the target task. The approach prunes the model class, models approximate weak monotonic relationships between tasks, and employs an adaptive hedging strategy over the performance frontier set to balance performance discrepancies among candidate models. Theoretical analysis demonstrates that this framework yields statistical gains under the weak monotonicity condition, and empirical results confirm its significant superiority over existing methods in few-shot scenarios.
📝 Abstract
When a learner faces a new task with few samples, it must leverage any available side information. In practice, this often comes in the form of model evaluations on related tasks in public benchmarks. A key question then is how to model task relatedness such that it is both realistic and the benchmark evaluations lead to provable gains. Empirically, we observe that weak monotonicity is often approximately satisfied: if a model dominates another on many benchmarks, it also tends to outperform on the new task. We explore the statistical complexity of learning under (approximate) weak monotonicity, leveraging it within two learning paradigms: transfer learning and model selection aggregation. We show that not only can we prune the model class based on monotonicity, but we can also further adapt to the geometry of the available trade-offs by hedging on the frontier.