NeRAF: 3D Scene Infused Neural Radiance and Acoustic Fields

📅 2024-05-28
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenges of modeling audio-visual spatial consistency and achieving high-fidelity novel-view synthesis and acoustic rendering under sparse observations, this paper proposes the first implicit audio-visual representation framework that jointly optimizes neural radiance fields (NeRF) and acoustic fields. Methodologically, it introduces geometric and appearance priors as conditioning signals to couple the acoustic field with the visual field—enabling synchronized generation of novel views and spatialized room impulse responses (RIRs). The framework supports spatially decoupled rendering across modalities, employs parametric RIR representation, and leverages cross-modal conditional modeling. Evaluated on SoundSpaces and RAF datasets, our method significantly improves RIR fidelity and novel-view synthesis quality under sparse-view settings, reduces training data requirements by over 30%, and achieves strong extensibility via modular integration with Nerfstudio.

Technology Category

Application Category

📝 Abstract
Sound plays a major role in human perception. Along with vision, it provides essential information for understanding our surroundings. Despite advances in neural implicit representations, learning acoustics that align with visual scenes remains a challenge. We propose NeRAF, a method that jointly learns acoustic and radiance fields. NeRAF synthesizes both novel views and spatialized room impulse responses (RIR) at new positions by conditioning the acoustic field on 3D scene geometric and appearance priors from the radiance field. The generated RIR can be applied to auralize any audio signal. Each modality can be rendered independently and at spatially distinct positions, offering greater versatility. We demonstrate that NeRAF generates high-quality audio on SoundSpaces and RAF datasets, achieving significant performance improvements over prior methods while being more data-efficient. Additionally, NeRAF enhances novel view synthesis of complex scenes trained with sparse data through cross-modal learning. NeRAF is designed as a Nerfstudio module, providing convenient access to realistic audio-visual generation.
Problem

Research questions and friction points this paper is trying to address.

Audio-Visual Synchronization
Neural Networks
Spatial Sound Generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

NeRAF
3D scene reconstruction
audio-visual synchronization
🔎 Similar Papers
No similar papers found.
A
Amandine Brunetto
Mines Paris, PSL Research University
S
Sascha Hornauer
Mines Paris, PSL Research University
Fabien Moutarde
Fabien Moutarde
MINES Paris, PSL Université Paris
Computer vision and Pattern Recognitionstatistical machine learning and Deep-LearningSelf-driving carsMobile and/or collab