The Hidden Cost of Pairwise Verification in Synthetic Speech Source Tracing

📅 2026-06-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates whether pairwise verification methods underperform global anchor-based approaches in open-set text-to-speech (TTS) source attribution due to differences in objective function design. Under identical backbone architectures (XLS-R), data, and training protocols, the authors compare both paradigms on MLAAD (in-domain) and STOPA (out-of-domain) benchmarks, complemented by embedding space analyses including k99 metrics and bottleneck experiments. The findings reveal that while pairwise objectives directly optimize similarity, they induce concentrated variance in embedding directions, thereby impairing discrimination among closely related TTS generators. In contrast, the global anchor method achieves a significantly lower in-domain equal error rate (EER) of 8.61%, outperforming pairwise approaches (12–15% EER). This advantage stems not from dimensional constraints but from a more favorable geometric structure in the learned embedding space.
📝 Abstract
Open-set source tracing is increasingly framed as a verification problem, motivating the use of pairwise metric-learning objectives from biometrics. We thus compare global anchoring and pairwise verification under matched backbones and a fixed data and epoch budget on MLAAD (in-domain) and STOPA (out-of-domain). In our runs, global anchoring yields lower in-domain error (8.61% EER) than pairwise variants (12-15% EER), even with rival mining and XLS-R finetuning. Because pairwise objectives optimize similarity directly, they concentrate variance into fewer embedding directions, reducing resolution among closely related generators. To test if this drives the drop, we impose a similar bottleneck to the globally supervised baseline, yet the baseline remains competitive. Together with an embedding-space analysis ($k_{99}$), these results suggest that the gap is not explained by dimensionality alone, but rather by the pairwise objective's shaping of the retained directions.
Problem

Research questions and friction points this paper is trying to address.

synthetic speech source tracing
open-set source tracing
pairwise verification
embedding space
metric learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

pairwise verification
global anchoring
embedding space analysis
synthetic speech tracing
metric learning
🔎 Similar Papers