Segment First, Retrieve Better: Realistic Legal Search via Rhetorical Role-Based Queries

📅 2025-08-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Legal precedent retrieval faces significant challenges due to incomplete factual descriptions and the massive scale of legal document corpora, leading to poor matching accuracy—especially under partial-knowledge conditions. Method: This paper proposes a rhetorical-role-guided query paragraph extraction framework. It introduces rhetorical role annotation to identify pivotal case paragraphs as query inputs; constructs a hierarchical BiLSTM-CRF model to parse rhetorical structures in legal texts; and designs a multi-stage hybrid retrieval pipeline comprising BM25-based candidate filtering, dense vector retrieval, and Cross-Encoder re-ranking, integrated via reciprocal rank fusion. Contribution/Results: The approach substantially improves matching precision and robustness in partial-knowledge scenarios. Empirical evaluation on the IL-PCR and COLIEE 2025 benchmarks confirms its effectiveness. By grounding retrieval in interpretable rhetorical structures, it enhances judicial consistency and decision transparency, establishing a scalable, efficient paradigm for precedent retrieval in large-scale legal text environments.

Technology Category

Application Category

📝 Abstract
Legal precedent retrieval is a cornerstone of the common law system, governed by the principle of stare decisis, which demands consistency in judicial decisions. However, the growing complexity and volume of legal documents challenge traditional retrieval methods. TraceRetriever mirrors real-world legal search by operating with limited case information, extracting only rhetorically significant segments instead of requiring complete documents. Our pipeline integrates BM25, Vector Database, and Cross-Encoder models, combining initial results through Reciprocal Rank Fusion before final re-ranking. Rhetorical annotations are generated using a Hierarchical BiLSTM CRF classifier trained on Indian judgments. Evaluated on IL-PCR and COLIEE 2025 datasets, TraceRetriever addresses growing document volume challenges while aligning with practical search constraints, reliable and scalable foundation for precedent retrieval enhancing legal research when only partial case knowledge is available.
Problem

Research questions and friction points this paper is trying to address.

Improving legal precedent retrieval with rhetorical role-based queries
Addressing challenges of growing legal document volume and complexity
Enhancing search reliability when only partial case information is available
Innovation

Methods, ideas, or system contributions that make the work stand out.

Extracts rhetorically significant case segments
Combines BM25, Vector DB, Cross-Encoder models
Uses Hierarchical BiLSTM CRF for annotations
🔎 Similar Papers
No similar papers found.
S
Shubham Kumar Nigam
IIT Kanpur, India
T
Tanmay Dubey
IIT Kanpur, India
Noel Shallum
Noel Shallum
Symbiosis Law School Pune
Machine LearningNLP
A
Arnab Bhattacharya
IIT Kanpur, India