Do Neural Retrievers Prefer Certain Documents? Evidence of Learned Relevance Priors

📅 2026-06-01
📈 Citations: 0
Influential: 0
📄 PDF

career value

176K/year
🤖 AI Summary
This study investigates how supervised neural retrievers may internalize query-agnostic, document-level relevance priors during training, leading to systematic under-retrieval of niche, fragmented, or highly technical documents. By freezing classifiers over document embeddings, conducting controlled document comparisons, and analyzing explanations generated by large language models, the authors evaluate prominent dense retrievers against BM25 across multiple information retrieval benchmarks. Their analysis reveals—quantitatively and for the first time—the presence of strong, generalizable, and cross-model consistent document preference biases in neural retrievers. These biases arise because implicit preferences embedded in training annotations are encoded as structural tendencies that significantly favor mainstream, self-contained, and well-formed documents.
📝 Abstract
Neural retrievers are trained to estimate query-document relevance from annotated query-document pairs. Yet annotation protocols may not purely reflect relevance: they select only a subset of documents for labeling, and this selection can favor certain document types over others. We investigate whether supervised bi-encoder retrievers implicitly learn a document-level relevance prior: a query-independent signal encoded in their representation space as a side effect of training on annotated data. We estimate this prior by training simple classifiers on frozen document embeddings and evaluate three state-of-the-art retrievers across multiple IR benchmarks. We find that supervised neural retrievers encode relevance priors that generalize to unseen documents and are consistent across models. These priors create a findability gap: documents with lower prior are systematically harder to retrieve, even when genuinely relevant. This effect appears in supervised dense retrievers but is weaker and less consistent in BM25, and it persists under controlled matched-document comparisons. Using LLM-based explanations, we find that judged-relevant documents tend to be comprehensive, self-contained summaries of mainstream topics, while niche, fragmentary, or highly technical content is often left unjudged. Retrievers internalize this bias, ranking documents with these favored features higher than documents that lack them, independently of their actual relevance. Our findings expose a structural limitation of supervised retrieval: models trained on annotated data do not just learn relevance, but also the implicit document preferences in their training data.
Problem

Research questions and friction points this paper is trying to address.

neural retrievers
relevance priors
annotation bias
findability gap
supervised retrieval
Innovation

Methods, ideas, or system contributions that make the work stand out.

relevance prior
neural retrievers
annotation bias
findability gap
document-level bias
🔎 Similar Papers
2024-05-03Annual International ACM SIGIR Conference on Research and Development in Information RetrievalCitations: 2