The Hidden Cost of Pairwise Verification in Synthetic Speech Source Tracing

📅 2026-06-10

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This study investigates whether pairwise verification methods underperform global anchor-based approaches in open-set text-to-speech (TTS) source attribution due to differences in objective function design. Under identical backbone architectures (XLS-R), data, and training protocols, the authors compare both paradigms on MLAAD (in-domain) and STOPA (out-of-domain) benchmarks, complemented by embedding space analyses including k99 metrics and bottleneck experiments. The findings reveal that while pairwise objectives directly optimize similarity, they induce concentrated variance in embedding directions, thereby impairing discrimination among closely related TTS generators. In contrast, the global anchor method achieves a significantly lower in-domain equal error rate (EER) of 8.61%, outperforming pairwise approaches (12–15% EER). This advantage stems not from dimensional constraints but from a more favorable geometric structure in the learned embedding space.

📝 Abstract

Open-set source tracing is increasingly framed as a verification problem, motivating the use of pairwise metric-learning objectives from biometrics. We thus compare global anchoring and pairwise verification under matched backbones and a fixed data and epoch budget on MLAAD (in-domain) and STOPA (out-of-domain). In our runs, global anchoring yields lower in-domain error (8.61% EER) than pairwise variants (12-15% EER), even with rival mining and XLS-R finetuning. Because pairwise objectives optimize similarity directly, they concentrate variance into fewer embedding directions, reducing resolution among closely related generators. To test if this drives the drop, we impose a similar bottleneck to the globally supervised baseline, yet the baseline remains competitive. Together with an embedding-space analysis ($k_{99}$), these results suggest that the gap is not explained by dimensionality alone, but rather by the pairwise objective's shaping of the retained directions.

Problem

Research questions and friction points this paper is trying to address.

synthetic speech source tracing

open-set source tracing

pairwise verification

embedding space

metric learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

pairwise verification

global anchoring

embedding space analysis