MAWARITH: A Dataset and Benchmark for Legal Inheritance Reasoning with LLMs

📅 2026-03-08

📈 Citations: 0

✨ Influential: 0

career value

165K/year

🤖 AI Summary

This study addresses the challenges large language models face in performing multi-step structured reasoning and precise share computation required by Islamic inheritance law (‘ilm al-mawārith). To this end, the authors introduce MAWARITH, a novel dataset comprising 12,500 Arabic cases with fine-grained annotations spanning heir identification, application of exclusion rules, and fractional share allocation—the first of its kind to offer comprehensive coverage of the entire inheritance reasoning pipeline. The work also proposes MIR-E, a multi-stage weighted evaluation metric that surpasses conventional answer-only assessment by capturing intermediate reasoning fidelity. In zero-shot settings, Gemini-2.5-Flash achieves 90% MIR-E accuracy on both validation and test sets, substantially outperforming baseline models. Error analysis further uncovers systematic deficiencies in the models’ comprehension of inheritance scenarios, heir determination, and correct rule application.

Technology Category

Application Category

📝 Abstract

Islamic inheritance law ('ilm al-mawarith) is challenging for large language models because solving inheritance cases requires complex, structured multi-step reasoning and the correct application of juristic rules to compute heirs'shares. We introduce MAWARITH, a large-scale annotated dataset of 12,500 Arabic inheritance cases to train and evaluate the full reasoning chain: (i) identifying eligible heirs, (ii) applying blocking (hajb) and allocation rules, and (iii) computing exact inheritance shares. Unlike prior datasets that restrict inheritance case solving to multiple-choice questions, MAWARITH supports the full reasoning chain and provides step-by-step solutions, including intermediate legal decisions and justifications based on classical juristic sources and established inheritance rules, as well as exact share calculations. To evaluate models beyond final-answer accuracy, we propose MIR-E (Mawarith Inheritance Reasoning Evaluation), a weighted multi-stage metric that scores key reasoning stages and captures error propagation across the pipeline. We evaluate five LLMs in a zero-shot setting. Gemini-2.5-flash achieves about 90% MIR-E on both validation and test, while Fanar-C, Fanar-Sadiq, LLaMA 3, and Qwen 3 remain below 50%. Our error analysis identifies recurring failure patterns, including scenario misinterpretation, errors in heir identification, errors in share allocation, and missing or incorrect application of key inheritance rules such as'awl and radd. The MAWARITH dataset is publicly available at https://github.com/bouchekif/inheritance_evaluation.

Problem

Research questions and friction points this paper is trying to address.

Islamic inheritance law

large language models

legal reasoning

inheritance shares

structured reasoning

Innovation

Methods, ideas, or system contributions that make the work stand out.

legal reasoning

inheritance law

step-by-step reasoning