Attention Calibration for Position-Fair Dense Information Retrieval

📅 2026-06-01
📈 Citations: 0
Influential: 0
📄 PDF

career value

177K/year
🤖 AI Summary
This work addresses the pervasive position bias in dense retrieval models, which significantly degrades recall performance for relevant content located toward the end of passages. The authors propose a training-free, inference-time attention calibration mechanism that interpolates attention weights using an adjustable strength coefficient λ. This approach integrates hierarchical calibration and basket sampling strategies, making it compatible with both <s>-token pooling and last-token pooling architectures. Evaluated under a unified default configuration, the method consistently enhances positional fairness across diverse models, architectures, and languages while preserving or even improving overall retrieval effectiveness. Specifically, it substantially increases the harmonic mean of nDCG@10 across position groups on FineWeb-PosQ and comprehensively reduces the position sensitivity index on the multilingual, multidomain PosIR benchmark.
📝 Abstract
Dense retrieval models exhibit positional bias: retrieval effectiveness degrades when relevant information appears later in a passage (Zeng et al., 2025). We ask whether this bias can be reduced at inference time, without retraining and without sacrificing overall retrieval effectiveness. To this end, we adapt inference-time attention calibration (Schuhmacher et al., 2026) to downstream retrieval and extend it with a strength coefficient lambda that interpolates between the original and fully calibrated attention distributions. Across three embedding models on SQuAD-PosQ and FineWeb-PosQ, we examine how basket size, calibrated layer set, and strength affect the trade-off between positional fairness and retrieval effectiveness, finding that partial calibration frequently outperforms full calibration. A single configuration (B=128, lambda=0.5, 50% layer depth) improves the harmonic mean of nDCG@10 across positional groups on FineWeb-PosQ for all three models without per-model tuning, and applies to both <s>-pooled and last-token-pooled architectures. This default configuration transfers without modification to PosIR, which spans 10 languages and 31 domains, reducing the Position Sensitivity Index in all 16 length-quartile x model x retrieval-setting combinations, while preserving or improving aggregate nDCG@10. We release our extended codebase at https://github.com/impresso/fair-sentence-transformers
Problem

Research questions and friction points this paper is trying to address.

positional bias
dense retrieval
attention calibration
retrieval fairness
inference-time adaptation
Innovation

Methods, ideas, or system contributions that make the work stand out.

attention calibration
positional fairness
dense retrieval
inference-time adaptation
position bias
A
Andrianos Michail
Department of Computational Linguistics, University of Zurich
E
Elias Schuhmacher
Department of Computational Linguistics, University of Zurich
Juri Opitz
Juri Opitz
University of Zurich
S
Simon Clematide
Department of Computational Linguistics, University of Zurich
R
Rico Sennrich
Department of Computational Linguistics, University of Zurich