🤖 AI Summary
This work addresses the challenge of efficiently selecting a suitable large language model (LLM) amid the rapidly growing number of available models and their opaque characteristics, which hinder users from identifying options aligned with their implicit preferences. To tackle this, the authors propose an interaction-efficient active learning framework that integrates dueling bandit algorithms with Bayesian preference modeling. The approach employs a belief-aware upper confidence bound strategy to dynamically balance exploration and exploitation, iteratively refining model recommendations under user-specified time and cost constraints. Experimental results across multiple LLMs and real-user studies demonstrate that the method significantly reduces interaction costs while achieving more accurate personalized model matching.
📝 Abstract
Users increasingly face the challenge of selecting an appropriate LLM for a given task from a rapidly growing pool of LLMs, each with distinct but often opaque latent properties. Compounding this challenge, users may lack the vocabulary or awareness to explicitly articulate the characteristics they value in an LLM's responses or deployment. We propose an interaction-efficient active learning framework in which a dueling bandit algorithm iteratively selects pairs of LLMs, collects user feedback about their responses, and updates its belief about the user's latent preferences. We introduce a novel belief-aware upper confidence bound strategy that balances exploration of the model pool with exploitation of inferred preferences, enabling efficient alignment between user needs and LLM capabilities under user-specified cost and time budgets. Through diverse experiments on LLMs and human studies, we experimentally verify that our model can efficiently match well-aligned LLMs to users at a lower cost.