LLM Anonymization Against Agentic Re-Identificatio

๐Ÿ“… 2026-05-29
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF

career value

185K/year
๐Ÿค– AI Summary
Existing text anonymization methods struggle to preserve utility for downstream analysis while defending against re-identification attacks by intelligent adversaries equipped with web-search capabilities. To address this challenge, this work proposes AURA, a novel framework that decouples privacy preservation from utility retention through a mask-and-reconstruct mechanism, incorporating adaptive privacy scopes and utility-oriented reconstruction strategies. AURA leverages large language models to perform anonymization, adversarial privacy auditing, context-aware utility evaluation via a utility grid, and simulation of intelligent re-identification attacks. Evaluated on real user interview data, AURA significantly advances the privacyโ€“utility frontier: it achieves superior contextual utility retention under fixed privacy guarantees and substantially enhances robustness against intelligent re-identification attacks.
๐Ÿ“ Abstract
Agentic LLMs with web search change the threat model for text anonymization: weak contextual cues can become cross-referenceable evidence for re-identification, yet those same details also carry downstream analytic value of the text. Existing defenses either remove explicit identifiers, perturb text for formal privacy, or test rewritten text against non-web inference models, leaving underexplored the operating region between resistance to agentic web-search re-identification and utility retention. We introduce AURA (\textbf{A}nonymization with \textbf{U}tility-\textbf{R}etention \textbf{A}daptation), an LLM-powered \textit{mask-reconstruct} framework that decouples privacy localization from utility-preserving reconstruction and selects candidates with adversarial privacy and utility-retention checks. We evaluate AURA on real-user interview transcripts using re-identification attacks carried out by web-search agents, along with a utility evaluation based on interviewee-profile facts, codebook facts, and the joint contextual utility grid. Our results show that AURA improves the privacy-utility frontier by using adaptive privacy scope to strengthen resistance to agentic re-identification and using a mask-reconstruct anonymization method to better preserve contextual utility under fixed privacy scope.
Problem

Research questions and friction points this paper is trying to address.

anonymization
re-identification
privacy-utility trade-off
agentic LLMs
text utility
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM anonymization
agentic re-identification
mask-reconstruct
privacy-utility tradeoff
adversarial privacy