AnchorSteer: Self-Discovered Concept Injection for Structure-Preserving Music Editing

📅 2026-05-29
📈 Citations: 0
Influential: 0
📄 PDF

career value

182K/year
🤖 AI Summary
This work addresses the challenge in controllable music editing of simultaneously achieving semantic modifications and preserving rhythmic-melodic structure. The authors propose an unsupervised, self-supervised approach that jointly enforces structural anchoring and semantic guidance within the latent space of a diffusion model. By leveraging a self-supervised reconstruction objective, the method extracts unlabeled concept vectors and introduces a plug-and-play structural adapter alongside a conditional/unconditional concept injection mechanism. This design enables high-fidelity semantic editing while effectively maintaining musical structure. Experiments demonstrate that the proposed method significantly outperforms baseline approaches—either semantics-only guided or structure-only anchored—on both the ZoME-Bench benchmark and subjective evaluations, marking the first unified framework capable of strong semantic transformations with high-fidelity structural preservation.
📝 Abstract
Controllable music editing is to modify high-level attributes while strictly preserving rhythmic and melodic structures. However, this task is challenged by a semantic-structural entanglement: steering methods often degrade structure to achieve editing performance, while structural adaptors suppress semantic responsiveness. We propose AnchorSteer, a framework that disentangles this tension by coupling structural anchoring with self-discovered semantic steering. The proposed approach probes internal representations to extract interpretable, label-free concept vectors via a self-supervised reconstruction objective, isolating attributes without curated data. During editing, these portable, plug-and-play concept vectors are injected into diffusion hidden manifolds while a structural adaptor enforces consistency. Variants for unconditioned and conditioned injections are provided to balance robustness and semantic strength. Experiments on ZoME-Bench and subjective tests show that the proposed framework outperforms both steering-only and anchoring-only baselines, enabling significant semantic transformations with high-fidelity structural preservation.
Problem

Research questions and friction points this paper is trying to address.

controllable music editing
semantic-structural entanglement
structure preservation
semantic steering
music generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

concept injection
structure-preserving editing
self-supervised discovery
diffusion models
semantic-structural disentanglement
🔎 Similar Papers