STORM: Stepwise Token Optimization with Reward-Guided Beam Search

📅 2026-06-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitations of lexical retrievers like BM25—namely, vocabulary mismatch—and the lack of fine-grained retrieval-oriented supervision in existing large language model (LLM) query rewriting approaches. The authors propose a self-supervised lexical query expansion framework that, for the first time, translates sequence-level retrieval rewards into token-level optimization signals. By employing reward-guided beam search, the method scores and prunes low-reward expansion candidates at each decoding step. Requiring no specialized index, the approach enables models ranging from 0.6B to 8B parameters to match or surpass state-of-the-art LLM-based rewriters on TREC DL and BEIR benchmarks. Notably, the 8B model rivals larger closed-source systems and achieves zero-shot superiority over dedicated multilingual dense retrievers on the MIRACL multilingual benchmark.
📝 Abstract
Modern retrieval increasingly relies on dense and learned-sparse neural models that are effective but require encoding the entire corpus into a specialized index, rebuilt whenever the model changes. Lexical retrievers like BM25 stay efficient and transparent on a standard inverted index that need not change as models evolve, but suffer from vocabulary mismatch. LLM query rewriting can help, yet prompted rewriters emit well-formed but retrieval-ineffective or harmful-terms, and training against a retrieval reward gives only delayed, sequence-level supervision that obscures which terms helped. We introduce STORM (Stepwise Token Optimization with Reward-guided beaM search), a self-supervised framework for lexical query expansion. STORM trains the rewriter through generation guided by retrieval metrics: at each step, candidate expansions are scored against the BM25 index and low-reward continuations pruned, turning the retrieval reward into a token-level signal that concentrates exploration on retrieval-effective vocabulary. Across TREC DL and BEIR, STORM lets 0.6B-8B backbones match or surpass competitive LLM rewriters while retrieving as fast as plain BM25; at 8B it rivals far larger proprietary rewriters. It further transfers zero-shot to 18 languages (MIRACL), beating dedicated multilingual dense retrievers on average, making STORM a competitive, infrastructure-light alternative to dense neural retrieval.
Problem

Research questions and friction points this paper is trying to address.

lexical retrieval
vocabulary mismatch
query rewriting
retrieval effectiveness
token-level supervision
Innovation

Methods, ideas, or system contributions that make the work stand out.

reward-guided beam search
token-level reward
lexical query expansion
self-supervised retrieval
BM25-compatible rewriting