Probing Token Spaces under Generator Shift in AI-Generated Music Detection

📅 2026-06-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing AI-generated music detectors exhibit limited generalization when confronted with unseen generators and lack evaluation benchmarks that reflect real-world deployment scenarios. This work addresses these gaps by introducing MoM-open, a source-constrained evaluation benchmark, and systematically investigates the impact of audio token representations on cross-generator generalization, with a focus on codec-based discrete token spaces as a core variable. By employing a fixed, lightweight unified classifier (CoMoE), the study enables a fair comparison across heterogeneous tokenization methods, including X-Codec and MERT. Experimental results demonstrate that the choice of token space significantly influences detection performance—X-Codec achieves superior results on Udio-generated audio, whereas MERT excels on Suno-v3.5 outputs—highlighting the critical role of audio representation in the generalization capability of forgery detection systems.
📝 Abstract
AI-generated music detectors can appear robust on standard benchmark splits, yet their deployments require transfer to generator sources absent during training. We study this problem with source-restricted evaluation on \textsc{MoM-open}, an open reconstruction of MoM-CLAM that replaces the non-redistributable real corpus with FMA and MTG-Jamendo while preserving the fake-generator protocol. To isolate the role of representation, we introduce \textsc{CoMoE}, a compact fixed classifier for comparing heterogeneous audio token spaces while keeping the downstream architecture and training recipe unchanged. Experiments show that standard and real-source-restricted splits are nearly saturated, whereas fake-source restriction exposes large differences between token spaces: X-Codec tokens are strongest when training on Udio alone, while MERT-derived tokens are stronger when training on Suno-v3.5 alone. These results suggest that codec-style discrete token spaces should be treated as a primary experimental axis under generator shift in AI-generated music detection. Our code and data are available at https://github.com/MAAP-LAB/CoMoE.
Problem

Research questions and friction points this paper is trying to address.

AI-generated music detection
generator shift
token spaces
source-restricted evaluation
domain generalization
Innovation

Methods, ideas, or system contributions that make the work stand out.

generator shift
audio token spaces
AI-generated music detection
CoMoE
source-restricted evaluation
🔎 Similar Papers