LLMs for Law: Evaluating Legal-Specific LLMs on Contract Understanding

📅 2025-08-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
A systematic evaluation of domain-specific large language models (LLMs) for legal text understanding—particularly contract classification—remains lacking. Method: This work presents the first comprehensive benchmark of ten legal-domain LLMs against seven general-purpose LLMs across three English contract understanding tasks, employing a multi-task evaluation framework emphasizing text classification and semantic understanding. Contribution/Results: Legal-specialized models significantly outperform general-purpose models, especially on tasks requiring fine-grained legal reasoning. Legal-BERT and Contracts-BERT achieve new state-of-the-art (SOTA) results on two tasks despite their relatively small parameter counts. CaseLaw-BERT and LexLM demonstrate strong baseline performance. Collectively, this study establishes a critical benchmark and provides empirically grounded guidance for model selection in contract understanding systems, advancing the development of precise, task-adapted legal AI.

Technology Category

Application Category

📝 Abstract
Despite advances in legal NLP, no comprehensive evaluation covering multiple legal-specific LLMs currently exists for contract classification tasks in contract understanding. To address this gap, we present an evaluation of 10 legal-specific LLMs on three English language contract understanding tasks and compare them with 7 general-purpose LLMs. The results show that legal-specific LLMs consistently outperform general-purpose models, especially on tasks requiring nuanced legal understanding. Legal-BERT and Contracts-BERT establish new SOTAs on two of the three tasks, despite having 69% fewer parameters than the best-performing general-purpose LLM. We also identify CaseLaw-BERT and LexLM as strong additional baselines for contract understanding. Our results provide a holistic evaluation of legal-specific LLMs and will facilitate the development of more accurate contract understanding systems.
Problem

Research questions and friction points this paper is trying to address.

Evaluating legal-specific LLMs on contract understanding tasks
Comparing legal and general-purpose LLMs in contract classification
Identifying top-performing models for nuanced legal comprehension
Innovation

Methods, ideas, or system contributions that make the work stand out.

Evaluates 10 legal-specific LLMs on contract tasks
Legal-specific LLMs outperform general-purpose models
Legal-BERT and Contracts-BERT achieve new SOTAs
🔎 Similar Papers
No similar papers found.
Amrita Singh
Amrita Singh
School of Computer Science and Engineering, University of New South Wales (UNSW)
Natural Language ProcessingLegal IntelligenceAI for Social Good
H
H. Suhan Karaca
School of Computer Science and Engineering, University of New South Wales (UNSW), Sydney
A
Aditya Joshi
School of Computer Science and Engineering, University of New South Wales (UNSW), Sydney
H
Hye-young Paik
School of Computer Science and Engineering, University of New South Wales (UNSW), Sydney
Jiaojiao Jiang
Jiaojiao Jiang
The University of New South Wales
Social Network Analysis and Service Virtualisation