LegalSeg: Unlocking the Structure of Indian Legal Judgments Through Rhetorical Role Classification

📅 2025-02-09

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses semantic segmentation of Indian legal judgment documents via rhetorical role classification. To address the lack of annotated resources, we introduce LegalSeg—the first large-scale, manually annotated dataset for this task, comprising over 7,000 judgments, 1.4 million sentences, and seven rhetorical roles. Methodologically, we propose a novel multi-paradigm framework that jointly models contextual semantics, inter-sentence structural relations, and sequential labeling—incorporating Hierarchical BiLSTM-CRF, ToInLegalBERT, graph neural networks (GNNs), role-aware Transformers, and an instruction-tuned large language model, RhetoricLLaMA. Experiments demonstrate that models integrating global context and structural dependencies substantially outperform sentence-level baselines. Our analysis uncovers critical challenges, including rhetorical role ambiguity and severe class imbalance. This work establishes the first benchmark for rhetorical segmentation in legal texts, accompanied by a fully reproducible open-source experimental framework and publicly released data resources.

Technology Category

Application Category

📝 Abstract

In this paper, we address the task of semantic segmentation of legal documents through rhetorical role classification, with a focus on Indian legal judgments. We introduce LegalSeg, the largest annotated dataset for this task, comprising over 7,000 documents and 1.4 million sentences, labeled with 7 rhetorical roles. To benchmark performance, we evaluate multiple state-of-the-art models, including Hierarchical BiLSTM-CRF, TransformerOverInLegalBERT (ToInLegalBERT), Graph Neural Networks (GNNs), and Role-Aware Transformers, alongside an exploratory RhetoricLLaMA, an instruction-tuned large language model. Our results demonstrate that models incorporating broader context, structural relationships, and sequential sentence information outperform those relying solely on sentence-level features. Additionally, we conducted experiments using surrounding context and predicted or actual labels of neighboring sentences to assess their impact on classification accuracy. Despite these advancements, challenges persist in distinguishing between closely related roles and addressing class imbalance. Our work underscores the potential of advanced techniques for improving legal document understanding and sets a strong foundation for future research in legal NLP.

Problem

Research questions and friction points this paper is trying to address.

Semantic segmentation of Indian legal judgments

Rhetorical role classification in legal documents

Improving legal document understanding using advanced techniques

Innovation

Methods, ideas, or system contributions that make the work stand out.

Rhetorical role classification for segmentation

Largest annotated dataset in legal NLP

State-of-the-art models for legal document analysis

🔎 Similar Papers

No similar papers found.

Authors to Follow