🤖 AI Summary
This work addresses the challenge of high-fidelity head-related transfer function (HRTF) reconstruction under sparse measurement conditions by proposing a novel framework that integrates wavelet scattering transform (WST) with neural radiance fields. The method incorporates multiscale WST statistical priors into the loss function and employs a two-stage masking strategy: first learning a shared binary mask from multi-subject data to capture common acoustic structures, then using this mask to guide subject-specific HRTF reconstruction while preserving critical spectral features. Experimental results demonstrate that the proposed approach significantly outperforms existing baselines in HRTF upsampling tasks, validating the effectiveness of mask-guided WST coefficient selection and optimization-driven reconstruction.
📝 Abstract
In this paper, we propose a reconstruction framework that leverages the Wavelet Scattering Transform (WST) as a multi-scale feature extractor to impose statistical priors under sparse observation conditions. The reconstruction problem is formulated as an optimization task and solved using a neural field, with the WST incorporated into the training loss function. As a proof of concept, we validate the proposed method on HRTF upsampling. A masking strategy is applied to the WST coefficients, resulting in a two-phase procedure. The first phase learns a binary mask from a small multi-subject dataset, while the second phase applies the learned mask to the WST coefficients of an individual HRTF to preserve informative statistical structures during reconstruction. Validation against baseline methods, which also serve as an ablation study of the different components of the framework, demonstrates the effectiveness of the proposed approach.