Transformer Tafsir at QIAS 2025 Shared Task: Hybrid Retrieval-Augmented Generation for Islamic Knowledge Question Answering

📅 2025-09-28

📈 Citations: 0

✨ Influential: 0

career value

174K/year

🤖 AI Summary

This study addresses the limited semantic understanding and reasoning capabilities of large language models (LLMs) in Islamic knowledge question answering. We propose a three-stage hybrid retrieval-augmented generation (RAG) framework: (1) initial keyword-based retrieval using BM25; (2) fine-grained semantic matching via dense embedding models; and (3) candidate paragraph re-ranking using a cross-encoder. By synergistically integrating sparse and dense retrieval paradigms, the framework significantly improves semantic matching accuracy and answer generation quality for religious texts. Experimental results on two Islamic knowledge subtasks demonstrate up to a 25% absolute improvement in accuracy over baseline methods. Specifically, the Fanar model achieves 45% accuracy on Subtask 1 and 80% on Subtask 2, validating the framework’s effectiveness and generalizability for domain-specific knowledge QA.

Technology Category

Application Category

📝 Abstract

This paper presents our submission to the QIAS 2025 shared task on Islamic knowledge understanding and reasoning. We developed a hybrid retrieval-augmented generation (RAG) system that combines sparse and dense retrieval methods with cross-encoder reranking to improve large language model (LLM) performance. Our three-stage pipeline incorporates BM25 for initial retrieval, a dense embedding retrieval model for semantic matching, and cross-encoder reranking for precise content retrieval. We evaluate our approach on both subtasks using two LLMs, Fanar and Mistral, demonstrating that the proposed RAG pipeline enhances performance across both, with accuracy improvements up to 25%, depending on the task and model configuration. Our best configuration is achieved with Fanar, yielding accuracy scores of 45% in Subtask 1 and 80% in Subtask 2.

Problem

Research questions and friction points this paper is trying to address.

Improving Islamic knowledge question answering accuracy

Enhancing large language models with hybrid retrieval methods

Combining sparse and dense retrieval with reranking

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid RAG system combining sparse and dense retrieval

Three-stage pipeline with BM25 and cross-encoder reranking

Enhanced LLM performance using semantic matching techniques

🔎 Similar Papers

A RAG-based Question Answering System Proposal for Understanding Islam: MufassirQAS LLM