Thunder-DeID: Accurate and Efficient De-identification Framework for Korean Court Judgments

📅 2025-06-18

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Korean judicial rulings must undergo compliant de-identification prior to public release; however, existing methods struggle to simultaneously achieve scalability, high-precision PII recognition, and legal compliance. Challenges include ambiguous definitions of Korean PII, a lack of judicial-domain-specific annotated data, and no legally grounded classification framework. Method: We introduce the first Korean PII annotation dataset specifically designed for judicial judgments; establish a law-aligned, systematic PII taxonomy; and propose an end-to-end deep neural framework integrating named entity recognition (NER), rule-enhanced post-processing, and explicit legal constraint modeling. Contribution/Results: Our approach achieves state-of-the-art performance on Korean judicial de-identification, significantly improving accuracy, processing efficiency, and regulatory adherence. It establishes a scalable, verifiable technical paradigm for privacy-preserving publication of judicial texts.

Technology Category

Application Category

📝 Abstract

To ensure a balance between open access to justice and personal data protection, the South Korean judiciary mandates the de-identification of court judgments before they can be publicly disclosed. However, the current de-identification process is inadequate for handling court judgments at scale while adhering to strict legal requirements. Additionally, the legal definitions and categorizations of personal identifiers are vague and not well-suited for technical solutions. To tackle these challenges, we propose a de-identification framework called Thunder-DeID, which aligns with relevant laws and practices. Specifically, we (i) construct and release the first Korean legal dataset containing annotated judgments along with corresponding lists of entity mentions, (ii) introduce a systematic categorization of Personally Identifiable Information (PII), and (iii) develop an end-to-end deep neural network (DNN)-based de-identification pipeline. Our experimental results demonstrate that our model achieves state-of-the-art performance in the de-identification of court judgments.

Problem

Research questions and friction points this paper is trying to address.

Balancing open justice with personal data protection in Korea

Inadequate current de-identification for large-scale court judgments

Vague legal definitions of personal identifiers hindering technical solutions

Innovation

Methods, ideas, or system contributions that make the work stand out.

First Korean legal dataset with annotated judgments

Systematic categorization of Personally Identifiable Information

End-to-end DNN-based de-identification pipeline

🔎 Similar Papers

No similar papers found.

Authors to Follow