Similarity-Guided Diffusion for Long-Gap Music Inpainting

📅 2025-09-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Music restoration aims to reconstruct prolonged missing segments in audio recordings, yet existing diffusion models struggle to preserve musical coherence under multi-second gaps. This paper proposes Similarity-guided Diffusion for Piano Restoration (SimDPS), a novel approach that leverages contextually similar audio segments as guiding signals. SimDPS reformulates the diffusion process’s likelihood function to enable context-aware restoration and integrates a similarity-retrieval-based candidate fusion mechanism within a posterior sampling framework. In experiments on 2-second gap restoration for piano music, subjective evaluations demonstrate that SimDPS significantly outperforms unguided diffusion baselines and surpasses pure similarity-search methods when moderately similar segments are available. The method effectively enhances both structural continuity and musical expressiveness in long-gap restoration, addressing key limitations of prior diffusion-based approaches.

Technology Category

Application Category

📝 Abstract
Music inpainting aims to reconstruct missing segments of a corrupted recording. While diffusion-based generative models improve reconstruction for medium-length gaps, they often struggle to preserve musical plausibility over multi-second gaps. We introduce Similarity-Guided Diffusion Posterior Sampling (SimDPS), a hybrid method that combines diffusion-based inference with similarity search. Candidate segments are first retrieved from a corpus based on contextual similarity, then incorporated into a modified likelihood that guides the diffusion process toward contextually consistent reconstructions. Subjective evaluation on piano music inpainting with 2-s gaps shows that the proposed SimDPS method enhances perceptual plausibility compared to unguided diffusion and frequently outperforms similarity search alone when moderately similar candidates are available. These results demonstrate the potential of a hybrid similarity approach for diffusion-based audio enhancement with long gaps.
Problem

Research questions and friction points this paper is trying to address.

Reconstructing missing segments in corrupted music recordings
Preserving musical plausibility over multi-second gaps
Enhancing perceptual plausibility in long-gap music inpainting
Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines diffusion-based inference with similarity search
Guides diffusion process using contextually consistent reconstructions
Retrieves candidate segments based on contextual similarity
🔎 Similar Papers
No similar papers found.
S
Sean Turland
Acoustics Lab, Dept. Information and Communications Engineering, Aalto University, Espoo, Finland
E
Eloi Moliner
Acoustics Lab, Dept. Information and Communications Engineering, Aalto University, Espoo, Finland
Vesa Välimäki
Vesa Välimäki
Professor of Audio Signal Processing, Aalto University, Espoo, Finland
Audio Signal ProcessingAcoustic Signal ProcessingAudio EngineeringMusic TechnologySound and Music Computing