🤖 AI Summary
This work addresses the challenge of achieving high-quality higher-order Ambisonics (HOA) encoding with sparse and irregular microphone arrays by proposing Flow-HOA, the first generative framework to incorporate conditional flow matching into HOA encoding. The method employs a composite loss function that jointly optimizes time-domain waveform fidelity, multi-resolution spectral consistency, subband energy preservation, and spatial directivity constraints, learning a mapping from a simple prior distribution to time-invariant FIR filter coefficients. Trained solely on synthetic data, Flow-HOA generalizes effectively to real-world recordings, significantly outperforming existing baselines in both objective metrics and subjective listening tests, demonstrating superior signal fidelity, enhanced spatial accuracy, and reduced audio artifacts.
📝 Abstract
Higher-Order Ambisonics (HOA) encoding from sparse, irregular microphone arrays remains a critical challenge for consumer spatial audio capture in immersive communication and XR. We propose Flow-HOA, a generative framework that jointly optimizes a multi-dimensional objective encompassing time-domain, spectral, and spatial fidelity while producing a deployable, time-invariant bank of Finite Impulse Response (FIR) encoding filters. Using conditional flow matching, the model learns to map a simple prior distribution to the target distribution of FIR filter coefficients. Training is guided by a composite loss that balances time-domain waveform fidelity, multi-resolution spectral consistency, sub-band energy preservation, and spatial directivity constraints. Objective evaluations on synthetically simulated data demonstrate improved performance over strong model-based baselines in both signal fidelity and spatial accuracy metrics. Subjective listening tests on real microphone array recordings further confirm that Flow-HOA yields higher overall sound quality with reduced artifacts, demonstrating generalization from synthetic training data to real-world capture conditions.