You Are What You Say: Exploiting Linguistic Content for VoicePrivacy Attacks

📅 2025-06-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work exposes an overlooked text-based speaker identity leakage risk in speech anonymization systems: even when voiceprints are fully masked, speaker identity can still be inferred solely from transcribed text—challenging the prevailing assumption that “speech privacy equals voiceprint concealment.” We propose the first BERT-based, text-only speaker verification attack, leveraging semantic representation learning and keyword attribution analysis to demonstrate that semantic similarity inherently encodes strong speaker-specific cues. Evaluated on the VoicePrivacy challenge dataset, our method achieves an average equal-error rate (EER) of 35%, with EERs as low as 2% for certain speakers—revealing significant bias in current automatic speaker verification (ASV) evaluations induced by textual content similarity. Our findings question the validity of global EER as a privacy metric and introduce a novel text-aware paradigm for speech privacy assessment.

Technology Category

Application Category

📝 Abstract
Speaker anonymization systems hide the identity of speakers while preserving other information such as linguistic content and emotions. To evaluate their privacy benefits, attacks in the form of automatic speaker verification (ASV) systems are employed. In this study, we assess the impact of intra-speaker linguistic content similarity in the attacker training and evaluation datasets, by adapting BERT, a language model, as an ASV system. On the VoicePrivacy Attacker Challenge datasets, our method achieves a mean equal error rate (EER) of 35%, with certain speakers attaining EERs as low as 2%, based solely on the textual content of their utterances. Our explainability study reveals that the system decisions are linked to semantically similar keywords within utterances, stemming from how LibriSpeech is curated. Our study suggests reworking the VoicePrivacy datasets to ensure a fair and unbiased evaluation and challenge the reliance on global EER for privacy evaluations.
Problem

Research questions and friction points this paper is trying to address.

Assess linguistic content impact on speaker anonymization attacks
Evaluate BERT-based ASV system performance on VoicePrivacy datasets
Challenge reliance on global EER for unbiased privacy evaluations
Innovation

Methods, ideas, or system contributions that make the work stand out.

BERT language model adapted for ASV
Linguistic content similarity impacts privacy
Explainability reveals keyword-based decisions
🔎 Similar Papers
No similar papers found.