🤖 AI Summary
This work addresses the challenges of hallucination and lexical gaps between user queries and legal provisions in complex multi-hop legal question answering, which hinder accurate statute retrieval. The authors propose a Decompose-and-Refine framework that uniquely integrates explicit question decomposition with parameterized knowledge query refinement. Specifically, complex questions are first decomposed into atomic sub-questions, each of which is then transformed into a parameterized query aligned with relevant legal statutes to enable precise, issue-level retrieval and structured reasoning. Evaluated on the KoBLEX benchmark, the method significantly improves both statute retrieval accuracy and answer quality while supporting transparent, verifiable reasoning at the individual issue level.
📝 Abstract
Large language models (LLMs) have shown strong performance in the legal domain, demonstrating notable potential in Legal Question Answering (LQA). However, unlike general QA, LQA requires answers that are not only accurate but also rigorously grounded in explicit legal authority. In statutory LQA, many questions require multi-hop reasoning across multiple legal issues, substantially increasing the risk of hallucination, thereby making accurate retrieval of supporting statutory provisions a critical prerequisite. Despite recent progress in multi-hop QA, existing approaches often rely on reasoning in natural language or retrieval without explicit query reformulation, leaving the vocabulary gap between user questions and statutory text largely unaddressed. To address this challenge, we propose Decompose-and-Refine (DaR), a statute-grounded LQA framework that tightly integrates step-wise question decomposition with parametric knowledge-based query refinement. DaR progressively decomposes a complex legal question into atomic sub-questions and generates statute-aligned parametric queries for each sub-question, enabling the selection of a single most central statutory provision corresponding to each legal issue. We evaluate DaR on KoBLEX, a Korean multi-hop LQA benchmark grounded in statutory law, using Qwen3-32B and Gemma3-27B. Experimental results demonstrate that DaR consistently improves both retrieval accuracy and final answer quality over existing approaches. Moreover, by explicitly separating sub-questions and their corresponding statutory provisions, DaR facilitates transparent, issue-level verification of complex legal reasoning processes.