🤖 AI Summary
This work proposes a large language model (LLM)-driven program evolution framework that automatically discovers improved lexical retrieval functions by representing candidate algorithms as executable code. Departing from traditional methods like BM25, which rely on manual parameter tuning and struggle to identify superior ranking functions, the approach integrates LLMs with evolutionary search. Starting from seed algorithms such as BM25, it iteratively applies mutation and recombination operators guided by performance feedback across multiple datasets. The evolved retrieval functions consistently outperform strong baselines on BEIR, BRIGHT, and TREC Deep Learning 2019/2020 benchmarks, demonstrating not only significant gains in effectiveness but also strong generalization and cross-dataset transfer capabilities.
📝 Abstract
Retrieval algorithms like BM25 and query likelihood with Dirichlet smoothing remain strong and efficient first-stage rankers, yet improvements have mostly relied on parameter tuning and human intuition. We investigate whether a large language model, guided by an evaluator and evolutionary search, can automatically discover improved lexical retrieval algorithms. We introduce RankEvolve, a program evolution setup based on AlphaEvolve, in which candidate ranking algorithms are represented as executable code and iteratively mutated, recombined, and selected based on retrieval performance across 12 IR datasets from BEIR and BRIGHT. RankEvolve starts from two seed programs: BM25 and query likelihood with Dirichlet smoothing. The evolved algorithms are novel, effective, and show promising transfer to the full BEIR and BRIGHT benchmarks as well as TREC DL 19 and 20. Our results suggest that evaluator-guided LLM program evolution is a practical path towards automatic discovery of novel ranking algorithms.