🤖 AI Summary
This work addresses the limited reasoning capability of neural information retrieval (IR) models and large language models (LLMs) in handling negation-bearing queries. We propose the first systematic negation taxonomy—grounded in philosophy, linguistics, and formal logic—to expose structural biases in existing IR datasets regarding negation type coverage. Leveraging this taxonomy, we construct NevIR, a manually annotated, type-balanced benchmark dataset for negation-aware IR. We further design a logic-driven negation classification mechanism and a targeted fine-tuning strategy. Experiments demonstrate that our approach significantly improves both accuracy and convergence speed on negation queries, achieving state-of-the-art performance on NevIR. Beyond empirical gains, this work establishes an interpretable analytical framework for negation modeling in IR and advances the generalization capacity of models on complex logical reasoning tasks.
📝 Abstract
Understanding and solving complex reasoning tasks is vital for addressing the information needs of a user. Although dense neural models learn contextualised embeddings, they still underperform on queries containing negation. To understand this phenomenon, we study negation in both traditional neural information retrieval and LLM-based models. We (1) introduce a taxonomy of negation that derives from philosophical, linguistic, and logical definitions; (2) generate two benchmark datasets that can be used to evaluate the performance of neural information retrieval models and to fine-tune models for a more robust performance on negation; and (3) propose a logic-based classification mechanism that can be used to analyze the performance of retrieval models on existing datasets. Our taxonomy produces a balanced data distribution over negation types, providing a better training setup that leads to faster convergence on the NevIR dataset. Moreover, we propose a classification schema that reveals the coverage of negation types in existing datasets, offering insights into the factors that might affect the generalization of fine-tuned models on negation.