🤖 AI Summary
In dense retrieval, single-layer document representations fail to fully capture the complementary linguistic knowledge distributed across layers of pretrained language models. To address this, we propose Multi-Layer Representation (MLR), a method that systematically leverages multi-layer encoder hidden states to construct more robust paragraph embeddings. MLR first analyzes layer-wise contributions to retrieval performance and then introduces a lightweight pooling strategy to compress multi-vector representations into efficient single vectors. Additionally, it integrates retrieval-oriented pretraining and hard negative mining to enhance discriminative capability. Experiments demonstrate that MLR significantly outperforms strong baselines—including Dual Encoder, ME-BERT, and ColBERT—under standard single-vector retrieval settings. It achieves state-of-the-art results on MS MARCO and Natural Questions, striking an effective balance between representational expressiveness and inference efficiency.
📝 Abstract
Dense retrieval models usually adopt vectors from the last hidden layer of the document encoder to represent a document, which is in contrast to the fact that representations in different layers of a pre-trained language model usually contain different kinds of linguistic knowledge, and behave differently during fine-tuning. Therefore, we propose to investigate utilizing representations from multiple encoder layers to make up the representation of a document, which we denote Multi-layer Representations (MLR). We first investigate how representations in different layers affect MLR's performance under the multi-vector retrieval setting, and then propose to leverage pooling strategies to reduce multi-vector models to single-vector ones to improve retrieval efficiency. Experiments demonstrate the effectiveness of MLR over dual encoder, ME-BERT and ColBERT in the single-vector retrieval setting, as well as demonstrate that it works well with other advanced training techniques such as retrieval-oriented pre-training and hard negative mining.