Generative Data Augmentation Challenge: Synthesis of Room Acoustics for Speaker Distance Estimation

📅 2025-01-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the scarcity and acquisition difficulty of real-world room impulse responses (RIRs) for speaker distance estimation, this paper introduces, for the first time, a generative data augmentation paradigm and establishes an end-to-end physically plausible RIR synthesis framework. Methodologically, it integrates deep generative models (e.g., GANs, VAEs, diffusion models) with acoustic prior constraints and differentiable acoustic rendering to produce spatially consistent and physically faithful RIRs. Key contributions include: (1) releasing the first standardized benchmark dataset and evaluation protocol dedicated to RIR generation; (2) open-sourcing the complete implementation pipeline; and (3) proposing a dual-axis evaluation metric that jointly assesses both RIR generation fidelity and downstream distance estimation performance. Experiments demonstrate that the synthesized RIRs substantially improve the generalization capability and robustness of distance estimation models.

Technology Category

Application Category

📝 Abstract
This paper describes the synthesis of the room acoustics challenge as a part of the generative data augmentation workshop at ICASSP 2025. The challenge defines a unique generative task that is designed to improve the quantity and diversity of the room impulse responses dataset so that it can be used for spatially sensitive downstream tasks: speaker distance estimation. The challenge identifies the technical difficulty in measuring or simulating many rooms' acoustic characteristics precisely. As a solution, it proposes generative data augmentation as an alternative that can potentially be used to improve various downstream tasks. The challenge website, dataset, and evaluation code are available at https://sites.google.com/view/genda2025.
Problem

Research questions and friction points this paper is trying to address.

Data Augmentation
Room Acoustics
Speaker Localization
Innovation

Methods, ideas, or system contributions that make the work stand out.

GENDA
Synthetic Room Acoustics
Speaker Distance Estimation
🔎 Similar Papers
No similar papers found.
J
Jackie Lin
University of Illinois at Urbana-Champaign
G
Georg Gotz
Treble Technologies
H
H. Llopis
Treble Technologies
H
Haukur Hafsteinsson
Treble Technologies
S
Steinar Gudhj'onsson
Treble Technologies
Daniel Gert Nielsen
Daniel Gert Nielsen
Ph.D student Acoustics
OptimizationNumerical ModellingVibro-acoustics
F
Finnur Pind
Treble Technologies
Paris Smaragdis
Paris Smaragdis
Professor, Massachusetts Institute of Technology
Audio Signal ProcessingComputational AuditionMachine LearningMachine Listening
Dinesh Manocha
Dinesh Manocha
Distinguished University Professor, University of Maryland at College Park
computer graphicsgeometric modelingmotion planningvirtual realityrobotics
John Hershey
John Hershey
Google (formerly MERL, IBM, MSR, UCSD)
machine learningsound separationspeech recognitionaudio-visual perception
T
T. Kristjansson
Amazon Lab126, Reykjavik University
M
Minje Kim
University of Illinois at Urbana-Champaign, Amazon Lab126