Beyond Content Relevance: Evaluating Instruction Following in Retrieval Models

📅 2024-10-31
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF

career value

186K/year
🤖 AI Summary
Existing retrieval models exhibit limited capability in adhering to user-specified document-level instructions—such as target audience, output format, or language preferences. Method: We propose InfoSearch, the first instruction-following document-level retrieval benchmark, introducing two novel evaluation metrics: Strict Instruction Compliance Rate (SICR) and Weighted Instruction Sensitivity Evaluation (WISE). We further design an LLM-driven, instruction-aware retrieval framework that integrates dense retrieval with attribute-aware re-ranking to support multi-dimensional constraint modeling. Contribution/Results: Empirical evaluation reveals consistently low instruction compliance across mainstream retrieval models. While fine-tuning and scaling improve performance, substantial gaps remain relative to practical deployment requirements. This work establishes a systematic evaluation framework and technical foundation for instruction-aware retrieval.

Technology Category

Application Category

📝 Abstract
Instruction-following capabilities in LLMs have progressed significantly, enabling more complex user interactions through detailed prompts. However, retrieval systems have not matched these advances, most of them still relies on traditional lexical and semantic matching techniques that fail to fully capture user intent. Recent efforts have introduced instruction-aware retrieval models, but these primarily focus on intrinsic content relevance, which neglects the importance of customized preferences for broader document-level attributes. This study evaluates the instruction-following capabilities of various retrieval models beyond content relevance, including LLM-based dense retrieval and reranking models. We develop InfoSearch, a novel retrieval evaluation benchmark spanning six document-level attributes: Audience, Keyword, Format, Language, Length, and Source, and introduce novel metrics -- Strict Instruction Compliance Ratio (SICR) and Weighted Instruction Sensitivity Evaluation (WISE) to accurately assess the models' responsiveness to instructions. Our findings indicate that although fine-tuning models on instruction-aware retrieval datasets and increasing model size enhance performance, most models still fall short of instruction compliance.
Problem

Research questions and friction points this paper is trying to address.

Evaluates instruction-following in retrieval models beyond content relevance.
Develops InfoSearch benchmark for six document-level attributes.
Introduces SICR and WISE metrics to assess instruction responsiveness.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Developed InfoSearch benchmark for retrieval evaluation
Introduced SICR and WISE metrics for instruction compliance
Evaluated LLM-based dense retrieval and reranking models
🔎 Similar Papers
No similar papers found.
Jianqun Zhou
Jianqun Zhou
Hunan University
AIGC
Y
Yuanlei Zheng
Huazhong University of Science and Technology
W
Wei Chen
Huazhong University of Science and Technology
Q
Qianqian Zheng
Digital Twin Institute, Eastern Institute of Technology, Ningbo
Zeyuan Shang
Zeyuan Shang
Huazhong University of Science and Technology
W
Wei Zhang
Digital Twin Institute, Eastern Institute of Technology, Ningbo
Rui Meng
Rui Meng
Salesforce Research
Machine LearningNatural Language Processing
Xiaoyu Shen
Xiaoyu Shen
Eastern Institute of Technology, Ningbo
language modelmulti-modal learningreasoning