Two-Stage Quranic QA via Ensemble Retrieval and Instruction-Tuned Answer Extraction

📅 2025-08-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Addressing the challenges posed by the linguistic complexity of Classical Arabic and the rich semantic structure of religious texts in Quranic question answering, this paper proposes a two-stage framework. In the first stage, an ensemble of fine-tuned multilingual Arabic language models performs passage retrieval. In the second stage, instruction-tuned large language models, augmented with few-shot prompting, extract precise answers—overcoming limitations of conventional fine-tuning under scarce domain-specific annotations. The approach eliminates reliance on large-scale labeled data while balancing retrieval accuracy and generative robustness. Evaluated on the Quran QA 2023 shared task, it achieves state-of-the-art performance: MAP@10 = 0.3128 and MRR@10 = 0.5763 in retrieval; pAP@10 = 0.669 in answer extraction—substantially outperforming existing methods.

Technology Category

Application Category

📝 Abstract
Quranic Question Answering presents unique challenges due to the linguistic complexity of Classical Arabic and the semantic richness of religious texts. In this paper, we propose a novel two-stage framework that addresses both passage retrieval and answer extraction. For passage retrieval, we ensemble fine-tuned Arabic language models to achieve superior ranking performance. For answer extraction, we employ instruction-tuned large language models with few-shot prompting to overcome the limitations of fine-tuning on small datasets. Our approach achieves state-of-the-art results on the Quran QA 2023 Shared Task, with a MAP@10 of 0.3128 and MRR@10 of 0.5763 for retrieval, and a pAP@10 of 0.669 for extraction, substantially outperforming previous methods. These results demonstrate that combining model ensembling and instruction-tuned language models effectively addresses the challenges of low-resource question answering in specialized domains.
Problem

Research questions and friction points this paper is trying to address.

Addressing Classical Arabic linguistic complexity in Quranic QA
Overcoming low-resource limitations in religious text processing
Improving passage retrieval and answer extraction accuracy
Innovation

Methods, ideas, or system contributions that make the work stand out.

Ensemble fine-tuned Arabic models for retrieval
Instruction-tuned LLMs with few-shot extraction
Two-stage framework combining retrieval and extraction
🔎 Similar Papers