Advantages of Domain Knowledge Injection for Legal Document Summarization: A Case Study on Summarizing Indian Court Judgments in English and Hindi

๐Ÿ“… 2026-02-07
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This study addresses the challenge of inaccessible legal language in Indian court judgments, which are predominantly written in complex English and thus difficult for the general public to comprehend, necessitating high-quality Englishโ€“Hindi bilingual summaries. The work presents the first systematic integration of legal domain knowledge into both extractive and abstractive summarization models: it develops language-specific extractive models incorporating domain-specialized pretrained encoders for English and Hindi, and further enhances large language models through continued pretraining on legal corpora. The project introduces legal-domain-specific evaluation metrics and expert validation protocols, achieving statistically significant improvements across automatic metrics, factual consistency, and domain relevance. Legal expert assessments confirm that the generated summaries substantially outperform existing baseline methods in quality.

Technology Category

Application Category

๐Ÿ“ Abstract
Summarizing Indian legal court judgments is a complex task not only due to the intricate language and unstructured nature of the legal texts, but also since a large section of the Indian population does not understand the complex English in which legal text is written, thus requiring summaries in Indian languages. In this study, we aim to improve the summarization of Indian legal text to generate summaries in both English and Hindi (the most widely spoken Indian language), by injecting domain knowledge into diverse summarization models. We propose a framework to enhance extractive neural summarization models by incorporating domain-specific pre-trained encoders tailored for legal texts. Further, we explore the injection of legal domain knowledge into generative models (including Large Language Models) through continual pre-training on large legal corpora in English and Hindi. Our proposed approaches achieve statistically significant improvements in both English-to-English and English-to-Hindi Indian legal document summarization, as measured by standard evaluation metrics, factual consistency metrics, and legal domain-specific metrics. Furthermore, these improvements are validated through domain experts, demonstrating the effectiveness of our approaches.
Problem

Research questions and friction points this paper is trying to address.

Legal Document Summarization
Domain Knowledge
Indian Court Judgments
Multilingual Summarization
English-to-Hindi
Innovation

Methods, ideas, or system contributions that make the work stand out.

domain knowledge injection
legal document summarization
multilingual summarization
continual pre-training
legal language models
๐Ÿ”Ž Similar Papers
No similar papers found.
D
Debtanu Datta
Department of Mathematics, Indian Institute of Technology Kharagpur, Kharagpur, India 721302
R
Rajdeep Mukherjee
Department of Computer Science and Engineering, Indian Institute of Technology Kharagpur, Kharagpur, India 721302
A
Adrijit Goswami
Department of Mathematics, Indian Institute of Technology Kharagpur, Kharagpur, India 721302
Saptarshi Ghosh
Saptarshi Ghosh
Department of CSE, Indian Institute of Technology Kharagpur, India
Computational Social ScienceLegal analyticsAlgorithmic bias and fairness