Who Benefits from RAG? The Role of Exposure, Utility and Attribution Bias

📅 2026-03-25

📈 Citations: 0

✨ Influential: 0

career value

163K/year

🤖 AI Summary

This study presents the first systematic investigation into fairness in Retrieval-Augmented Generation (RAG) systems across queries from different demographic groups. To assess whether RAG exacerbates performance disparities between groups, the work proposes an analytical framework encompassing group utility, exposure, and attribution bias. Leveraging the TREC 2022 Fair Ranking Track dataset, large-scale experiments on passage and headline generation tasks reveal that RAG significantly amplifies accuracy gaps between groups compared to standalone large language models. The research not only formalizes a query-group fairness framework specific to RAG but also empirically demonstrates strong correlations between the three fairness dimensions and overall performance gains, thereby providing both theoretical grounding and empirical evidence for developing more equitable RAG systems.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) enhanced with Retrieval-Augmented Generation (RAG) have achieved substantial improvements in accuracy by grounding their responses in external documents that are relevant to the user's query. However, relatively little work has investigated the impact of RAG in terms of fairness. Particularly, it is not yet known if queries that are associated with certain groups within a fairness category systematically receive higher accuracy, or accuracy improvements in RAG systems compared to LLM-only, a phenomenon we refer to as query group fairness. In this work, we conduct extensive experiments to investigate the impact of three key factors on query group fairness in RAG, namely: Group exposure, i.e., the proportion of documents from each group appearing in the retrieved set, determined by the retriever; Group utility, i.e., the degree to which documents from each group contribute to improving answer accuracy, capturing retriever-generator interactions; and Group attribution, i.e., the extent to which the generator relies on documents from each group when producing responses. We examine group-level average accuracy and accuracy improvements disparities across four fairness categories using three datasets derived from the TREC 2022 Fair Ranking Track for two tasks: article generation and title generation. Our findings show that RAG systems suffer from the query group fairness problem and amplify disparities in terms of average accuracy across queries from different groups, compared to an LLM-only setting. Moreover, group utility, exposure, and attribution can exhibit strong positive or negative correlations with average accuracy or accuracy improvements of queries from that group, highlighting their important role in fair RAG. Our data and code are publicly available from Github.

Problem

Research questions and friction points this paper is trying to address.

query group fairness

Retrieval-Augmented Generation

fairness

large language models

accuracy disparity

Innovation

Methods, ideas, or system contributions that make the work stand out.

query group fairness

retrieval-augmented generation

group exposure