ACORD: An Expert-Annotated Retrieval Dataset for Legal Contract Drafting

📅 2025-01-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the challenge of fine-grained clause retrieval in legal contract drafting by introducing ACORD, the first expert-annotated contract clause retrieval benchmark. ACORD focuses on high-complexity clauses—such as limitation of liability, indemnification, and change of control—comprising 114 real-world queries and 126,000 query-clause pairs, with five-level fine-grained relevance annotations. Methodologically, we propose a hybrid architecture combining a dual-encoder retriever and an LLM-based pointwise re-ranker, and conduct a systematic evaluation of LLM re-ranking efficacy on ACORD. Key contributions include: (1) establishing the first domain-specific legal information retrieval benchmark tailored to contract drafting; (2) formalizing a challenging, practice-oriented clause retrieval task grounded in precedent-based legal reasoning; and (3) empirically exposing substantial limitations of current models in modeling legal semantics, thereby providing a reproducible, high-fidelity evaluation platform for future research.

Technology Category

Application Category

📝 Abstract
Information retrieval, specifically contract clause retrieval, is foundational to contract drafting because lawyers rarely draft contracts from scratch; instead, they locate and revise the most relevant precedent. We introduce the Atticus Clause Retrieval Dataset (ACORD), the first retrieval benchmark for contract drafting fully annotated by experts. ACORD focuses on complex contract clauses such as Limitation of Liability, Indemnification, Change of Control, and Most Favored Nation. It includes 114 queries and over 126,000 query-clause pairs, each ranked on a scale from 1 to 5 stars. The task is to find the most relevant precedent clauses to a query. The bi-encoder retriever paired with pointwise LLMs re-rankers shows promising results. However, substantial improvements are still needed to effectively manage the complex legal work typically undertaken by lawyers. As the first retrieval benchmark for contract drafting annotated by experts, ACORD can serve as a valuable IR benchmark for the NLP community.
Problem

Research questions and friction points this paper is trying to address.

Legal Contract Analysis
Automated Clause Identification
Dataset Development
Innovation

Methods, ideas, or system contributions that make the work stand out.

ACORD Dataset
Legal Contract Analysis
Dual Encoder Model