LegalSeg: Unlocking the Structure of Indian Legal Judgments Through Rhetorical Role Classification

📅 2025-02-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses semantic segmentation of Indian legal judgment documents via rhetorical role classification. To address the lack of annotated resources, we introduce LegalSeg—the first large-scale, manually annotated dataset for this task, comprising over 7,000 judgments, 1.4 million sentences, and seven rhetorical roles. Methodologically, we propose a novel multi-paradigm framework that jointly models contextual semantics, inter-sentence structural relations, and sequential labeling—incorporating Hierarchical BiLSTM-CRF, ToInLegalBERT, graph neural networks (GNNs), role-aware Transformers, and an instruction-tuned large language model, RhetoricLLaMA. Experiments demonstrate that models integrating global context and structural dependencies substantially outperform sentence-level baselines. Our analysis uncovers critical challenges, including rhetorical role ambiguity and severe class imbalance. This work establishes the first benchmark for rhetorical segmentation in legal texts, accompanied by a fully reproducible open-source experimental framework and publicly released data resources.

Technology Category

Application Category

📝 Abstract
In this paper, we address the task of semantic segmentation of legal documents through rhetorical role classification, with a focus on Indian legal judgments. We introduce LegalSeg, the largest annotated dataset for this task, comprising over 7,000 documents and 1.4 million sentences, labeled with 7 rhetorical roles. To benchmark performance, we evaluate multiple state-of-the-art models, including Hierarchical BiLSTM-CRF, TransformerOverInLegalBERT (ToInLegalBERT), Graph Neural Networks (GNNs), and Role-Aware Transformers, alongside an exploratory RhetoricLLaMA, an instruction-tuned large language model. Our results demonstrate that models incorporating broader context, structural relationships, and sequential sentence information outperform those relying solely on sentence-level features. Additionally, we conducted experiments using surrounding context and predicted or actual labels of neighboring sentences to assess their impact on classification accuracy. Despite these advancements, challenges persist in distinguishing between closely related roles and addressing class imbalance. Our work underscores the potential of advanced techniques for improving legal document understanding and sets a strong foundation for future research in legal NLP.
Problem

Research questions and friction points this paper is trying to address.

Semantic segmentation of Indian legal judgments
Rhetorical role classification in legal documents
Improving legal document understanding using advanced techniques
Innovation

Methods, ideas, or system contributions that make the work stand out.

Rhetorical role classification for segmentation
Largest annotated dataset in legal NLP
State-of-the-art models for legal document analysis
🔎 Similar Papers
No similar papers found.