🤖 AI Summary
Legal precedent retrieval faces significant challenges due to incomplete factual descriptions and the massive scale of legal document corpora, leading to poor matching accuracy—especially under partial-knowledge conditions. Method: This paper proposes a rhetorical-role-guided query paragraph extraction framework. It introduces rhetorical role annotation to identify pivotal case paragraphs as query inputs; constructs a hierarchical BiLSTM-CRF model to parse rhetorical structures in legal texts; and designs a multi-stage hybrid retrieval pipeline comprising BM25-based candidate filtering, dense vector retrieval, and Cross-Encoder re-ranking, integrated via reciprocal rank fusion. Contribution/Results: The approach substantially improves matching precision and robustness in partial-knowledge scenarios. Empirical evaluation on the IL-PCR and COLIEE 2025 benchmarks confirms its effectiveness. By grounding retrieval in interpretable rhetorical structures, it enhances judicial consistency and decision transparency, establishing a scalable, efficient paradigm for precedent retrieval in large-scale legal text environments.
📝 Abstract
Legal precedent retrieval is a cornerstone of the common law system, governed by the principle of stare decisis, which demands consistency in judicial decisions. However, the growing complexity and volume of legal documents challenge traditional retrieval methods. TraceRetriever mirrors real-world legal search by operating with limited case information, extracting only rhetorically significant segments instead of requiring complete documents. Our pipeline integrates BM25, Vector Database, and Cross-Encoder models, combining initial results through Reciprocal Rank Fusion before final re-ranking. Rhetorical annotations are generated using a Hierarchical BiLSTM CRF classifier trained on Indian judgments. Evaluated on IL-PCR and COLIEE 2025 datasets, TraceRetriever addresses growing document volume challenges while aligning with practical search constraints, reliable and scalable foundation for precedent retrieval enhancing legal research when only partial case knowledge is available.