VLegal-Bench: Cognitively Grounded Benchmark for Vietnamese Legal Reasoning of Large Language Models

📅 2025-12-16

📈 Citations: 0

✨ Influential: 0

career value

190K/year

🤖 AI Summary

Evaluating large language models’ (LLMs) legal competence in Vietnam is challenging due to the complexity and dynamic evolution of its legal system. Method: This paper introduces VnLegalBench—the first cognitively stratified benchmark for Vietnamese legal AI—comprising 10,450 expert-annotated, validation-verified instances across four task categories: legal question answering, retrieval-augmented generation, multi-step reasoning, and scenario-based problem solving. Grounded in Bloom’s Taxonomy, it establishes a multi-level cognitive assessment framework tightly integrated with real-world legal workflows and interpretable analysis. We propose an integrated methodology combining expert collaborative annotation, document provenance tracing, cross-expert inter-annotator validation, and task-aware modeling. Contribution/Results: VnLegalBench provides a standardized, transparent, and reproducible evaluation platform that significantly enhances LLMs’ reliability and practicality in Vietnamese legal understanding, logical reasoning, and ethical alignment.

Technology Category

Application Category

📝 Abstract

The rapid advancement of large language models (LLMs) has enabled new possibilities for applying artificial intelligence within the legal domain. Nonetheless, the complexity, hierarchical organization, and frequent revisions of Vietnamese legislation pose considerable challenges for evaluating how well these models interpret and utilize legal knowledge. To address this gap, Vietnamese Legal Benchmark (VLegal-Bench) is introduced, the first comprehensive benchmark designed to systematically assess LLMs on Vietnamese legal tasks. Informed by Bloom's cognitive taxonomy, VLegal-Bench encompasses multiple levels of legal understanding through tasks designed to reflect practical usage scenarios. The benchmark comprises 10,450 samples generated through a rigorous annotation pipeline, where legal experts label and cross-validate each instance using our annotation system to ensure every sample is grounded in authoritative legal documents and mirrors real-world legal assistant workflows, including general legal questions and answers, retrieval-augmented generation, multi-step reasoning, and scenario-based problem solving tailored to Vietnamese law. By providing a standardized, transparent, and cognitively informed evaluation framework, VLegal-Bench establishes a solid foundation for assessing LLM performance in Vietnamese legal contexts and supports the development of more reliable, interpretable, and ethically aligned AI-assisted legal systems.

Problem

Research questions and friction points this paper is trying to address.

Assesses LLMs on Vietnamese legal tasks using cognitive taxonomy

Evaluates interpretation of complex, evolving Vietnamese legislation

Provides benchmark for reliable AI-assisted Vietnamese legal systems

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces VLegal-Bench benchmark for Vietnamese legal tasks

Uses Bloom's taxonomy for multi-level legal understanding assessment

Includes expert-annotated samples from real-world legal workflows

🔎 Similar Papers

Leveraging Large Language Models for Relevance Judgments in Legal Case Retrieval