HF-RAG: Hierarchical Fusion-based RAG with Multiple Sources and Rankers

📅 2025-09-02

📈 Citations: 0

✨ Influential: 0

career value

147K/year

🤖 AI Summary

This paper addresses the challenge of fusing multi-source heterogeneous evidence in retrieval-augmented generation (RAG). To tackle the incompatibility of disparate scoring scales across IR models and the limited generalization of single-model retrievers, we propose a hierarchical rank fusion framework: (1) constructing dual retrieval channels—one for labeled and one for unlabeled data; (2) unifying retrieval scores via z-score normalization to harmonize heterogeneous ranking outputs; and (3) integrating cross-source results using a multi-information-source separation–aggregation strategy. Our approach effectively mitigates score incomparability and model-specific bias. Empirical evaluation on fact verification demonstrates consistent superiority over state-of-the-art single-model and single-source baselines in retrieval accuracy, while significantly improving out-of-domain generalization. These results validate the efficacy of co-modeling heterogeneous data and jointly optimizing multiple rankers within a unified fusion architecture.

Technology Category

Application Category

📝 Abstract

Leveraging both labeled (input-output associations) and unlabeled data (wider contextual grounding) may provide complementary benefits in retrieval augmented generation (RAG). However, effectively combining evidence from these heterogeneous sources is challenging as the respective similarity scores are not inter-comparable. Additionally, aggregating beliefs from the outputs of multiple rankers can improve the effectiveness of RAG. Our proposed method first aggregates the top-documents from a number of IR models using a standard rank fusion technique for each source (labeled and unlabeled). Next, we standardize the retrieval score distributions within each source by applying z-score transformation before merging the top-retrieved documents from the two sources. We evaluate our approach on the fact verification task, demonstrating that it consistently improves over the best-performing individual ranker or source and also shows better out-of-domain generalization.

Problem

Research questions and friction points this paper is trying to address.

Combining labeled and unlabeled data sources in RAG

Standardizing incomparable similarity scores from heterogeneous sources

Aggregating multiple ranker outputs to improve retrieval effectiveness

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses hierarchical fusion with multiple sources and rankers

Applies z-score transformation to standardize retrieval scores

Combines labeled and unlabeled data via rank fusion

🔎 Similar Papers

Mix-of-Granularity: Optimize the Chunking Granularity for Retrieval-Augmented Generation