MAWARITH: A Dataset and Benchmark for Legal Inheritance Reasoning with LLMs

📅 2026-03-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the challenges large language models face in performing multi-step structured reasoning and precise share computation required by Islamic inheritance law (‘ilm al-mawārith). To this end, the authors introduce MAWARITH, a novel dataset comprising 12,500 Arabic cases with fine-grained annotations spanning heir identification, application of exclusion rules, and fractional share allocation—the first of its kind to offer comprehensive coverage of the entire inheritance reasoning pipeline. The work also proposes MIR-E, a multi-stage weighted evaluation metric that surpasses conventional answer-only assessment by capturing intermediate reasoning fidelity. In zero-shot settings, Gemini-2.5-Flash achieves 90% MIR-E accuracy on both validation and test sets, substantially outperforming baseline models. Error analysis further uncovers systematic deficiencies in the models’ comprehension of inheritance scenarios, heir determination, and correct rule application.

Technology Category

Application Category

📝 Abstract
Islamic inheritance law ('ilm al-mawarith) is challenging for large language models because solving inheritance cases requires complex, structured multi-step reasoning and the correct application of juristic rules to compute heirs'shares. We introduce MAWARITH, a large-scale annotated dataset of 12,500 Arabic inheritance cases to train and evaluate the full reasoning chain: (i) identifying eligible heirs, (ii) applying blocking (hajb) and allocation rules, and (iii) computing exact inheritance shares. Unlike prior datasets that restrict inheritance case solving to multiple-choice questions, MAWARITH supports the full reasoning chain and provides step-by-step solutions, including intermediate legal decisions and justifications based on classical juristic sources and established inheritance rules, as well as exact share calculations. To evaluate models beyond final-answer accuracy, we propose MIR-E (Mawarith Inheritance Reasoning Evaluation), a weighted multi-stage metric that scores key reasoning stages and captures error propagation across the pipeline. We evaluate five LLMs in a zero-shot setting. Gemini-2.5-flash achieves about 90% MIR-E on both validation and test, while Fanar-C, Fanar-Sadiq, LLaMA 3, and Qwen 3 remain below 50%. Our error analysis identifies recurring failure patterns, including scenario misinterpretation, errors in heir identification, errors in share allocation, and missing or incorrect application of key inheritance rules such as'awl and radd. The MAWARITH dataset is publicly available at https://github.com/bouchekif/inheritance_evaluation.
Problem

Research questions and friction points this paper is trying to address.

Islamic inheritance law
large language models
legal reasoning
inheritance shares
structured reasoning
Innovation

Methods, ideas, or system contributions that make the work stand out.

legal reasoning
inheritance law
step-by-step reasoning
MIR-E evaluation
LLM benchmarking
🔎 Similar Papers
No similar papers found.
A
Abdessalam Bouchekif
Hamad Bin Khalifa University, Qatar
S
Shahd Gaben
Hamad Bin Khalifa University, Qatar
S
Samer Rashwani
Hamad Bin Khalifa University, Qatar
Somaya Eltanbouly
Somaya Eltanbouly
Qatar University
M
Mutaz Al-Khatib
Hamad Bin Khalifa University, Qatar
H
Heba Sbahi
Hamad Bin Khalifa University, Qatar
M
Mohammed Ghaly
Hamad Bin Khalifa University, Qatar
Emad Mohamed
Emad Mohamed
Nazarbayev University
Cultural AnalyticsMedia AnalyticsDigital HumanitiesComputational linguistics