LegalRAG: A Hybrid RAG System for Multilingual Legal Information Retrieval

๐Ÿ“… 2025-04-19
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
To address the low retrieval accuracy and poor interpretability in multilingual legal information retrieval from Bangladesh Police Gazette (Englishโ€“Bengali bilingual documents), this paper proposes a rule-augmented hybrid RAG framework. The method integrates BERT-multilingual for semantic encoding and FAISS for efficient vector-based retrieval, augmented with language identification, bilingual paragraph alignment, and domain-specific keyword enhancement modules. Crucially, it introduces a rule-driven cross-lingual paragraph re-ranking mechanism to improve relevance and transparency. Evaluated on a real-world test set, the approach achieves significant improvements over baseline RAG: Recall@5 increases by 23.6% and answer accuracy rises by 18.4%. It is particularly effective in low-resource bilingual legal settings, enabling real-time, precise, and auditable legal clause localization and question answering.

Technology Category

Application Category

๐Ÿ“ Abstract
Natural Language Processing (NLP) and computational linguistic techniques are increasingly being applied across various domains, yet their use in legal and regulatory tasks remains limited. To address this gap, we develop an efficient bilingual question-answering framework for regulatory documents, specifically the Bangladesh Police Gazettes, which contain both English and Bangla text. Our approach employs modern Retrieval Augmented Generation (RAG) pipelines to enhance information retrieval and response generation. In addition to conventional RAG pipelines, we propose an advanced RAG-based approach that improves retrieval performance, leading to more precise answers. This system enables efficient searching for specific government legal notices, making legal information more accessible. We evaluate both our proposed and conventional RAG systems on a diverse test set on Bangladesh Police Gazettes, demonstrating that our approach consistently outperforms existing methods across all evaluation metrics.
Problem

Research questions and friction points this paper is trying to address.

Develops bilingual QA framework for legal documents
Enhances retrieval performance in multilingual legal texts
Improves accessibility of government legal notices
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid RAG system for bilingual legal retrieval
Advanced RAG improves retrieval precision
Efficient searching for government legal notices
๐Ÿ”Ž Similar Papers
No similar papers found.
Muhammad Rafsan Kabir
Muhammad Rafsan Kabir
Department of Electrical and Computer Engineering, North South University
machine learningnatural language processingcomputer vision
R
Rafeed Mohammad Sultan
Apurba-NSU R&D Lab, Department of Electrical and Computer Engineering, North South University, Dhaka, Bangladesh
F
Fuad Rahman
Apurba Technologies, Sunnyvale, CA 94085, USA
M
Mohammad Ruhul Amin
Fordham University, New York, USA
S
Sifat Momen
Apurba-NSU R&D Lab, Department of Electrical and Computer Engineering, North South University, Dhaka, Bangladesh
Nabeel Mohammed
Nabeel Mohammed
North South University
Natural Language ProcessingComputer VisionDeep Learning
Shafin Rahman
Shafin Rahman
Associate Professor, ECE, North South University, Bangladesh
Computer VisionMachine Learning