The Inverse Drum Machine: Source Separation Through Joint Transcription and Analysis-by-Synthesis

📅 2025-05-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses drum kit source separation under isolated dry drum supervision—where no ground-truth drum stems are available. We propose an end-to-end jointly optimized reverse drum machine framework that requires only coarse-grained rhythmic annotations (e.g., beat or downbeat positions), eliminating the need for clean, time-aligned drum tracks. Our method integrates automatic drum hit detection with differentiable single-hit sample synthesis within an analysis-synthesis closed loop to enable cross-task collaborative optimization. Key technical contributions include: (1) a neural-network-driven joint training architecture; (2) a temporally aligned convolutional reconstruction mechanism; and (3) a fully differentiable drum sample synthesis module. Evaluated on the StemGMD dataset, our approach achieves separation performance comparable to state-of-the-art supervised methods (SDR improvement up to 2.1 dB) and significantly outperforms unsupervised baselines such as NMF. To our knowledge, this is the first work to achieve high-fidelity drum source separation using rhythm annotations alone.

Technology Category

Application Category

📝 Abstract
We introduce the Inverse Drum Machine (IDM), a novel approach to drum source separation that combines analysis-by-synthesis with deep learning. Unlike recent supervised methods that rely on isolated stems, IDM requires only transcription annotations. It jointly optimizes automatic drum transcription and one-shot drum sample synthesis in an end-to-end framework. By convolving synthesized one-shot samples with estimated onsets-mimicking a drum machine-IDM reconstructs individual drum stems and trains a neural network to match the original mixture. Evaluations on the StemGMD dataset show that IDM achieves separation performance on par with state-of-the-art supervised methods, while substantially outperforming matrix decomposition baselines.
Problem

Research questions and friction points this paper is trying to address.

Combines drum transcription and synthesis for source separation
Uses only transcription annotations, not isolated stems
Matches supervised methods without stem dependencies
Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines analysis-by-synthesis with deep learning
Uses only transcription annotations, not isolated stems
Jointly optimizes transcription and sample synthesis
🔎 Similar Papers
No similar papers found.
B
Bernardo Torres
Laboratoire de Traitement et Communication de l’Information (LTCI), Télécom Paris, Institut Polytechnique de Paris, 91120 Palaiseau, France
Geoffroy Peeters
Geoffroy Peeters
Télécom Paris (previously IRCAM - STMS)
audio signal processingmachine learningmusic information retrieval
G
G. Richard
Laboratoire de Traitement et Communication de l’Information (LTCI), Télécom Paris, Institut Polytechnique de Paris, 91120 Palaiseau, France