JBE-QA: Japanese Bar Exam QA Dataset for Assessing Legal Domain Knowledge

πŸ“… 2025-11-27
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Existing large language models (LLMs) lack systematic evaluation of legal knowledge acquisition and reasoning capabilities in the Japanese legal domain. Method: We introduce JBE-QAβ€”the first publicly available, multi-domain legal question-answering dataset tailored to the Japanese Bar Examination (2015–2024), covering civil law, criminal law, and constitutional law. We innovatively reformulate multiple-choice questions into structured true/false judgment tasks and propose a fine-grained annotation framework to support rigorous legal reasoning evaluation. Contribution/Results: JBE-QA establishes the first unified, cross-domain benchmark for Japanese legal AI, overcoming prior limitations confined to civil law. Comprehensive evaluation across 26 LLMs reveals that proprietary reasoning-enabled models achieve top performance, and constitutional law questions exhibit lower overall difficulty than civil or criminal law items. This work provides the first open, multi-domain, high-fidelity evaluation benchmark for legal AI research in Japan.

Technology Category

Application Category

πŸ“ Abstract
We introduce JBE-QA, a Japanese Bar Exam Question-Answering dataset to evaluate large language models' legal knowledge. Derived from the multiple-choice (tanto-shiki) section of the Japanese bar exam (2015-2024), JBE-QA provides the first comprehensive benchmark for Japanese legal-domain evaluation of LLMs. It covers the Civil Code, the Penal Code, and the Constitution, extending beyond the Civil Code focus of prior Japanese resources. Each question is decomposed into independent true/false judgments with structured contextual fields. The dataset contains 3,464 items with balanced labels. We evaluate 26 LLMs, including proprietary, open-weight, Japanese-specialised, and reasoning models. Our results show that proprietary models with reasoning enabled perform best, and the Constitution questions are generally easier than the Civil Code or the Penal Code questions.
Problem

Research questions and friction points this paper is trying to address.

Evaluates LLMs' Japanese legal knowledge via bar exam questions
Covers Civil, Penal, and Constitutional law beyond prior datasets
Assesses model performance across proprietary and open-weight LLMs
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dataset derived from Japanese bar exam questions
Decomposes questions into true/false judgments with structured fields
Evaluates diverse LLMs including Japanese-specialised models
πŸ”Ž Similar Papers
No similar papers found.
Z
Zhihan Cao
Institute of Science Tokyo, Japan
F
Fumihito Nishino
ROIS-DS, Center for Juris-informatics, Japan
H
Hiroaki Yamada
Institute of Science Tokyo, Japan
Nguyen Ha Thanh
Nguyen Ha Thanh
PhD, National Institute of Informatics
Legal AIEthical AIAgentic AIResponsible AI
Y
Yusuke Miyao
University of Tokyo, Japan
Ken Satoh
Ken Satoh
National Institute of Informatics
Artificial Intelligence