Thunder-DeID: Accurate and Efficient De-identification Framework for Korean Court Judgments

📅 2025-06-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Korean judicial rulings must undergo compliant de-identification prior to public release; however, existing methods struggle to simultaneously achieve scalability, high-precision PII recognition, and legal compliance. Challenges include ambiguous definitions of Korean PII, a lack of judicial-domain-specific annotated data, and no legally grounded classification framework. Method: We introduce the first Korean PII annotation dataset specifically designed for judicial judgments; establish a law-aligned, systematic PII taxonomy; and propose an end-to-end deep neural framework integrating named entity recognition (NER), rule-enhanced post-processing, and explicit legal constraint modeling. Contribution/Results: Our approach achieves state-of-the-art performance on Korean judicial de-identification, significantly improving accuracy, processing efficiency, and regulatory adherence. It establishes a scalable, verifiable technical paradigm for privacy-preserving publication of judicial texts.

Technology Category

Application Category

📝 Abstract
To ensure a balance between open access to justice and personal data protection, the South Korean judiciary mandates the de-identification of court judgments before they can be publicly disclosed. However, the current de-identification process is inadequate for handling court judgments at scale while adhering to strict legal requirements. Additionally, the legal definitions and categorizations of personal identifiers are vague and not well-suited for technical solutions. To tackle these challenges, we propose a de-identification framework called Thunder-DeID, which aligns with relevant laws and practices. Specifically, we (i) construct and release the first Korean legal dataset containing annotated judgments along with corresponding lists of entity mentions, (ii) introduce a systematic categorization of Personally Identifiable Information (PII), and (iii) develop an end-to-end deep neural network (DNN)-based de-identification pipeline. Our experimental results demonstrate that our model achieves state-of-the-art performance in the de-identification of court judgments.
Problem

Research questions and friction points this paper is trying to address.

Balancing open justice with personal data protection in Korea
Inadequate current de-identification for large-scale court judgments
Vague legal definitions of personal identifiers hindering technical solutions
Innovation

Methods, ideas, or system contributions that make the work stand out.

First Korean legal dataset with annotated judgments
Systematic categorization of Personally Identifiable Information
End-to-end DNN-based de-identification pipeline
🔎 Similar Papers
No similar papers found.
S
Sungen Hahm
Graduate School of Data Science, Seoul National University
Heejin Kim
Heejin Kim
Korea University
Organic ChemistryFlow Chemistry
Gyuseong Lee
Gyuseong Lee
LG Electronics
Computer VisionMachine Learning
H
Hyunji Park
Graduate School of Data Science, Seoul National University
J
Jaejin Lee
Dept. of Computer Science and Engineering, Seoul National University, Seoul, Republic of Korea