SynthCloner: Synthesizer Preset Conversion via Factorized Codec with ADSR Envelope Control

📅 2025-09-29

📈 Citations: 0

✨ Influential: 0

career value

204K/year

🤖 AI Summary

Electronic synthesizer preset conversion faces challenges including strong coupling between timbre and ADSR envelopes, the inability of existing methods to explicitly control envelope parameters, and the absence of envelope-diversity-annotated datasets. To address these, we propose the first factorized encoder-decoder architecture that disentangles timbre, ADSR envelope, and musical content—achieved via joint optimization of spectrogram reconstruction, contrastive learning, and an ADSR-aware loss for explicit envelope modeling and independent editing. We introduce SynthCAT, a dedicated dataset encompassing diverse synthesizer types and extensive ADSR configurations. Experiments demonstrate significant improvements over baselines in both objective metrics and subjective listening evaluations, achieving breakthroughs in preset conversion fidelity and fine-grained controllability. Our code, models, and audio examples are publicly released.

Technology Category

Application Category

📝 Abstract

Electronic synthesizer sounds are controlled by presets, parameters settings that yield complex timbral characteristics and ADSR envelopes, making preset conversion particularly challenging. Recent approaches to timbre transfer often rely on spectral objectives or implicit style matching, offering limited control over envelope shaping. Moreover, public synthesizer datasets rarely provide diverse coverage of timbres and ADSR envelopes. To address these gaps, we present SynthCloner, a factorized codec model that disentangles audio into three attributes: ADSR envelope, timbre, and content. This separation enables expressive synthesizer preset conversion with independent control over these three attributes. Additionally, we introduce SynthCAT, a new synthesizer dataset with a task-specific rendering pipeline covering 250 timbres, 120 ADSR envelopes, and 100 MIDI sequences. Experiments show that SynthCloner outperforms baselines on both objective and subjective metrics, while enabling independent attribute control. The code, model checkpoint, and audio examples are available at https://buffett0323.github.io/synthcloner/.

Problem

Research questions and friction points this paper is trying to address.

Disentangles audio into ADSR envelope, timbre, and content

Enables expressive synthesizer preset conversion with attribute control

Addresses limited envelope control and scarce diverse synthesizer datasets

Innovation

Methods, ideas, or system contributions that make the work stand out.

Factorized codec model disentangles audio attributes

Independent ADSR envelope and timbre control

New synthesizer dataset with task-specific rendering pipeline

🔎 Similar Papers

No similar papers found.