A Large-Scale Per-Speaker Analysis of Re-identification Risk in Speech Anonymization

📅 2026-06-05

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Traditional speaker anonymization evaluation relies on aggregate metrics, overlooking substantial inter-speaker variations in re-identification risk. This study conducts a large-scale, speaker-level privacy analysis of nearly 5,000 speakers under worst-case scenarios, systematically assessing re-identification risks across diverse combinations of anonymization methods, adversary architectures, and utterance lengths using linkability-based metrics. The findings reveal that re-identification difficulty is not determined by inherent speaker characteristics but emerges from the interaction among the anonymization scheme, adversary capability, and available speech duration. Crucially, the composition of easily or hardly identifiable speaker groups shifts markedly with system configuration, challenging the assumption of static privacy risk and underscoring the necessity of conditioning privacy evaluations on specific attack models and anonymization settings.

📝 Abstract

Speech anonymization is commonly evaluated using averagecase metrics such as the equal error rate, which can hide large disparities in re-identification risks across individuals. In this paper, we conduct a large-scale per-speaker privacy analysis using a linkability-based metric under a worst-case scenario. Nearly 5,000 speakers are evaluated across multiple anonymization systems, attacker architectures, and conversation lengths. While linkability scores are highly polarized at the speaker level, the sets of easy to re-identify and hard to re-identify speakers vary substantially across configurations. We show that no single factor explains speaker vulnerability. Instead, the re-identification risk emerges from the interaction between the attacker, the anonymizer, and the amount of available speech. These results challenge the notion of intrinsic speaker-level privacy risks and emphasize the need for evaluation protocols that are explicitly conditioned on the attacker and anonymizer.

Problem

Research questions and friction points this paper is trying to address.

speech anonymization

re-identification risk

per-speaker analysis

privacy evaluation

linkability

Innovation

Methods, ideas, or system contributions that make the work stand out.

per-speaker analysis

re-identification risk

speech anonymization