🤖 AI Summary
This work addresses the unsupervised inpainting of missing regions in audio spectrograms. We propose Janssen-TF, the first autoregressive inpainting method explicitly designed for the time-frequency (TF) domain, adapting and reconstructing the original time-domain autoregressive model Janssen within the STFT spectrogram space. Janssen-TF requires no training, employs minimal parameters, and relies solely on local TF-structured autoregressive modeling for spectral interpolation. Quantitatively, it achieves statistically significant improvements over state-of-the-art deep-prior-based neural networks (e.g., DeepFill, GMCNN) across objective metrics—PSNR, SSIM, and Log-Spectral Distance (LSD). Subjectively, it attains superior perceptual quality in MUSHRA listening tests. The core contribution is the establishment of the first TF-domain autoregressive inpainting framework, demonstrating that lightweight, training-free paradigms can deliver high-fidelity audio restoration with substantially reduced computational overhead.
📝 Abstract
The paper focuses on inpainting missing parts of an audio signal spectrogram. The autoregression-based Janssen algorithm, the state-of-the-art for the time-domain audio inpainting, is adapted for the time-frequency setting. This novel method, termed Janssen-TF, is compared to the deep-prior neural network approach using both objective metrics and a~subjective listening test, proving Janssen-TF to be superior in all the considered measures.