JBE-QA: Japanese Bar Exam QA Dataset for Assessing Legal Domain Knowledge

📅 2025-11-27

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Existing large language models (LLMs) lack systematic evaluation of legal knowledge acquisition and reasoning capabilities in the Japanese legal domain. Method: We introduce JBE-QA—the first publicly available, multi-domain legal question-answering dataset tailored to the Japanese Bar Examination (2015–2024), covering civil law, criminal law, and constitutional law. We innovatively reformulate multiple-choice questions into structured true/false judgment tasks and propose a fine-grained annotation framework to support rigorous legal reasoning evaluation. Contribution/Results: JBE-QA establishes the first unified, cross-domain benchmark for Japanese legal AI, overcoming prior limitations confined to civil law. Comprehensive evaluation across 26 LLMs reveals that proprietary reasoning-enabled models achieve top performance, and constitutional law questions exhibit lower overall difficulty than civil or criminal law items. This work provides the first open, multi-domain, high-fidelity evaluation benchmark for legal AI research in Japan.

Technology Category

Application Category

📝 Abstract

We introduce JBE-QA, a Japanese Bar Exam Question-Answering dataset to evaluate large language models' legal knowledge. Derived from the multiple-choice (tanto-shiki) section of the Japanese bar exam (2015-2024), JBE-QA provides the first comprehensive benchmark for Japanese legal-domain evaluation of LLMs. It covers the Civil Code, the Penal Code, and the Constitution, extending beyond the Civil Code focus of prior Japanese resources. Each question is decomposed into independent true/false judgments with structured contextual fields. The dataset contains 3,464 items with balanced labels. We evaluate 26 LLMs, including proprietary, open-weight, Japanese-specialised, and reasoning models. Our results show that proprietary models with reasoning enabled perform best, and the Constitution questions are generally easier than the Civil Code or the Penal Code questions.

Problem

Research questions and friction points this paper is trying to address.

Evaluates LLMs' Japanese legal knowledge via bar exam questions

Covers Civil, Penal, and Constitutional law beyond prior datasets

Assesses model performance across proprietary and open-weight LLMs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dataset derived from Japanese bar exam questions

Decomposes questions into true/false judgments with structured fields

Evaluates diverse LLMs including Japanese-specialised models

🔎 Similar Papers

No similar papers found.

Authors to Follow