NOWJ@COLIEE 2025: A Multi-stage Framework Integrating Embedding Models and Large Language Models for Legal Retrieval and Entailment

📅 2025-09-09

📈 Citations: 0

✨ Influential: 0

career value

175K/year

🤖 AI Summary

This work addresses the suboptimal performance of case retrieval and entailment identification in legal information processing. We propose a multi-stage collaborative framework integrating embedding models and large language models (LLMs). Methodologically, it comprises two stages: (1) a lexical-semantic pre-ranking stage leveraging BM25 and multi-source embedding models (e.g., BERT, BGE-m3, LLM2Vec); and (2) a context-aware re-ranking and entailment inference stage using LLMs including Qwen-2, QwQ-32B, and DeepSeek-V3. To our knowledge, this is the first systematic integration of traditional retrieval and generative modeling for the COLIEE 2025 tasks, enabling end-to-end semantic filtering and inference co-optimization. Experiments demonstrate state-of-the-art performance: our approach achieves first place in the legal case entailment task (F1 = 0.3195) and shows strong competitiveness across four additional tasks—statute retrieval, judgment prediction, and others—validating the efficacy of the hybrid paradigm.

Technology Category

Application Category

📝 Abstract

This paper presents the methodologies and results of the NOWJ team's participation across all five tasks at the COLIEE 2025 competition, emphasizing advancements in the Legal Case Entailment task (Task 2). Our comprehensive approach systematically integrates pre-ranking models (BM25, BERT, monoT5), embedding-based semantic representations (BGE-m3, LLM2Vec), and advanced Large Language Models (Qwen-2, QwQ-32B, DeepSeek-V3) for summarization, relevance scoring, and contextual re-ranking. Specifically, in Task 2, our two-stage retrieval system combined lexical-semantic filtering with contextualized LLM analysis, achieving first place with an F1 score of 0.3195. Additionally, in other tasks--including Legal Case Retrieval, Statute Law Retrieval, Legal Textual Entailment, and Legal Judgment Prediction--we demonstrated robust performance through carefully engineered ensembles and effective prompt-based reasoning strategies. Our findings highlight the potential of hybrid models integrating traditional IR techniques with contemporary generative models, providing a valuable reference for future advancements in legal information processing.

Problem

Research questions and friction points this paper is trying to address.

Integrating embedding models and LLMs for legal retrieval

Advancing legal case entailment with multi-stage frameworks

Combining IR techniques with generative models for law

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-stage framework integrating embedding models

Combined lexical-semantic filtering with LLM analysis

Hybrid models merging traditional IR with generative AI

🔎 Similar Papers

Leveraging Large Language Models for Relevance Judgments in Legal Case Retrieval