Section-Weighted Hybrid Approach for Legal Case Retrieval

📅 2026-06-02

📈 Citations: 0

✨ Influential: 0

career value

160K/year

🤖 AI Summary

Traditional legal case retrieval relies on surface-level lexical matching, which fails to capture the similarity in legal reasoning between precedents. This work proposes a two-stage, paragraph-aware retrieval framework: the first stage combines BM25 with dense vector retrieval to generate a high-recall candidate set, while the second stage performs fine-grained alignment across structured paragraphs—such as facts, disputed issues, rulings, and reasoning—and fuses multi-source signals using query-level Z-score normalization and learnable paragraph weights. By introducing paragraph-level alignment and query-adaptive normalization, the method significantly outperforms strong baselines on judicial benchmarks, achieving higher precision in analogous case matching without sacrificing coverage, and further enables interpretable, paragraph-level justifications for retrieval results.

📝 Abstract

Finding truly analogous precedents requires capturing legal reasoning beyond surface word overlap. We present a two-stage, section-aware framework for legal case retrieval that first segments raw judgments into facts, issues, decision, and reasoning using a deterministic large language model (LLM) offline. In Stage 1, we combine parallel lexical (BM25) and semantic (dense ANN) whole-document searches via Reciprocal Rank Fusion (RRF) to form a high-recall candidate pool. In Stage 2, we perform fine-grained, like-for-like comparisons (e.g., query reasoning vs. candidate reasoning). To address the scale mismatch between unbounded lexical scores and cosine similarities, we apply query-wise Z-score normalization before aggregating signals with learned section weights. For the top results, the system returns the relevant section text with a concise, grounded rationale and party-stance labels. We evaluate on a jurisdiction-scale benchmark, demonstrating consistent gains over strong lexical and neural baselines while maintaining high candidate coverage

Problem

Research questions and friction points this paper is trying to address.

legal case retrieval

analogous precedents

legal reasoning

section-aware retrieval

lexical-semantic mismatch

Innovation

Methods, ideas, or system contributions that make the work stand out.

section-weighted retrieval

legal case retrieval

hybrid search