QIAS 2026: Overview of the Shared Task on Islamic Inheritance Reasoning

📅 2026-06-11

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work proposes the first end-to-end evaluation framework to assess the complex reasoning capabilities of large language models in the domain of Islamic inheritance law, a challenging intersection of religious doctrine and legal computation. Centered on the newly introduced MAWARITH benchmark—comprising 12,500 Arabic-language cases—the QIAS 2026 shared task requires models to identify rightful heirs from natural language scenarios and compute their precise shares. The study innovatively incorporates annotated intermediate reasoning steps and introduces MIR-E, a multi-stage, fine-grained evaluation metric, while exploring diverse technical approaches including prompt engineering, retrieval-augmented generation, and model fine-tuning. Results from 16 participating teams demonstrate that current models still face significant difficulties in tasks demanding rigorous legal interpretation and structured numerical reasoning, underscoring both the high complexity and scholarly value of this benchmark.

📝 Abstract

This paper presents a comprehensive overview of the QIAS 2026 shared task, organized as part of the OSACT7 Workshop and co-located with LREC 2026. The shared task was designed to evaluate the ability of large language models to perform complex reasoning in the religious and legal domain of Islamic inheritance. Unlike conventional question-answering benchmarks, QIAS 2026 focuses on end-to-end reasoning from natural language cases, requiring systems to perform the full inheritance calculation process, from identifying the eligible heirs to assigning the correct share to each beneficiary. To support this evaluation, the task was based on the MAWARITH benchmark, a dataset of $12{,}500$ Arabic inheritance cases annotated with intermediate reasoning steps and final answers. System submissions were evaluated using MIR-E, a multi-step metric that measures performance across the main stages of inheritance reasoning. A total of $16$ teams participated in the shared task, investigating a range of approaches, including prompting-based methods, retrieval-augmented generation, and fine-tuning strategies. The results show that Islamic inheritance remains a highly challenging benchmark for current language models, especially in stages that require precise legal interpretation and structured numerical reasoning. This overview summarizes the task design, dataset, evaluation framework, participating systems, and main results.

Problem

Research questions and friction points this paper is trying to address.

Islamic inheritance

legal reasoning

large language models

question answering

structured reasoning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Islamic inheritance reasoning

end-to-end legal reasoning

structured numerical reasoning

multi-step evaluation metric

MAWARITH benchmark

🔎 Similar Papers

No similar papers found.