🤖 AI Summary
Real-time, physically accurate acoustic wave propagation simulation is computationally prohibitive, while existing precomputation-based approaches incur excessive memory costs in large-scale scenes. To address this challenge, this work proposes the Reciprocal Latent Field (RLF) framework, which jointly models acoustic transfer functions using trainable voxel-wise latent variables and a symmetric neural decoder. By integrating Riemannian metric learning with reciprocity constraints, RLF enforces both acoustic reciprocity and physical consistency while drastically reducing storage requirements. The method achieves high-fidelity acoustic field reconstruction with orders-of-magnitude lower memory consumption compared to prior approaches. Subjective listening tests confirm that the rendered audio is perceptually indistinguishable from ground-truth simulations, demonstrating RLF’s effectiveness in balancing efficiency, physical plausibility, and auditory realism.
📝 Abstract
Realistic sound propagation is essential for immersion in a virtual scene, yet physically accurate wave-based simulations remain computationally prohibitive for real-time applications. Wave coding methods address this limitation by precomputing and compressing impulse responses of a given scene into a set of scalar acoustic parameters, which can reach unmanageable sizes in large environments with many source-receiver pairs. We introduce Reciprocal Latent Fields (RLF), a memory-efficient framework for encoding and predicting these acoustic parameters. The RLF framework employs a volumetric grid of trainable latent embeddings decoded with a symmetric function, ensuring acoustic reciprocity. We study a variety of decoders and show that leveraging Riemannian metric learning leads to a better reproduction of acoustic phenomena in complex scenes. Experimental validation demonstrates that RLF maintains replication quality while reducing the memory footprint by several orders of magnitude. Furthermore, a MUSHRA-like subjective listening test indicates that sound rendered via RLF is perceptually indistinguishable from ground-truth simulations.