🤖 AI Summary
This paper addresses drum kit source separation under isolated dry drum supervision—where no ground-truth drum stems are available. We propose an end-to-end jointly optimized reverse drum machine framework that requires only coarse-grained rhythmic annotations (e.g., beat or downbeat positions), eliminating the need for clean, time-aligned drum tracks. Our method integrates automatic drum hit detection with differentiable single-hit sample synthesis within an analysis-synthesis closed loop to enable cross-task collaborative optimization. Key technical contributions include: (1) a neural-network-driven joint training architecture; (2) a temporally aligned convolutional reconstruction mechanism; and (3) a fully differentiable drum sample synthesis module. Evaluated on the StemGMD dataset, our approach achieves separation performance comparable to state-of-the-art supervised methods (SDR improvement up to 2.1 dB) and significantly outperforms unsupervised baselines such as NMF. To our knowledge, this is the first work to achieve high-fidelity drum source separation using rhythm annotations alone.
📝 Abstract
We introduce the Inverse Drum Machine (IDM), a novel approach to drum source separation that combines analysis-by-synthesis with deep learning. Unlike recent supervised methods that rely on isolated stems, IDM requires only transcription annotations. It jointly optimizes automatic drum transcription and one-shot drum sample synthesis in an end-to-end framework. By convolving synthesized one-shot samples with estimated onsets-mimicking a drum machine-IDM reconstructs individual drum stems and trains a neural network to match the original mixture. Evaluations on the StemGMD dataset show that IDM achieves separation performance on par with state-of-the-art supervised methods, while substantially outperforming matrix decomposition baselines.