The Sonar Moment: Benchmarking Audio-Language Models in Audio Geo-Localization

📅 2026-01-06
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the scarcity of high-quality audio-location paired data that has hindered research in audio geolocation. To this end, we introduce AGL1K, the first benchmark for audio geolocation, spanning 72 countries and regions, and propose a novel metric—“audio localizability”—to curate 1,444 high-quality audio clips for evaluation. Leveraging this benchmark, we conduct a systematic analysis of 16 audio language models (ALMs), examining their regional biases, reasoning pathways, and reliance on linguistic cues. Our findings reveal that closed-source models significantly outperform open-source counterparts and that predictions are predominantly driven by language-related signals rather than acoustic or environmental features. AGL1K thus provides a robust foundation for evaluating and advancing the geospatial reasoning capabilities of audio language models.

Technology Category

Application Category

📝 Abstract
Geo-localization aims to infer the geographic origin of a given signal. In computer vision, geo-localization has served as a demanding benchmark for compositional reasoning and is relevant to public safety. In contrast, progress on audio geo-localization has been constrained by the lack of high-quality audio-location pairs. To address this gap, we introduce AGL1K, the first audio geo-localization benchmark for audio language models (ALMs), spanning 72 countries and territories. To extract reliably localizable samples from a crowd-sourced platform, we propose the Audio Localizability metric that quantifies the informativeness of each recording, yielding 1,444 curated audio clips. Evaluations on 16 ALMs show that ALMs have emerged with audio geo-localization capability. We find that closed-source models substantially outperform open-source models, and that linguistic clues often dominate as a scaffold for prediction. We further analyze ALMs'reasoning traces, regional bias, error causes, and the interpretability of the localizability metric. Overall, AGL1K establishes a benchmark for audio geo-localization and may advance ALMs with better geospatial reasoning capability.
Problem

Research questions and friction points this paper is trying to address.

audio geo-localization
audio-language models
geospatial reasoning
benchmark dataset
localizability
Innovation

Methods, ideas, or system contributions that make the work stand out.

audio geo-localization
audio-language models
benchmark dataset
Audio Localizability metric
geospatial reasoning
🔎 Similar Papers
No similar papers found.
R
Ruixing Zhang
The State Key Laboratory of Complex and Critical Software Environment, Beihang University
Zihan Liu
Zihan Liu
Shanghai Jiao Tong University
ArchitectureCompiler
Leilei Sun
Leilei Sun
Beihang University
Data MiningMachine LearningGraph Learning
T
Tongyu Zhu
The State Key Laboratory of Complex and Critical Software Environment, Beihang University; The Key Laboratory of Data Science and Intelligent Computing, International Innovation Institute, Beihang University
W
Weifeng Lv
The State Key Laboratory of Complex and Critical Software Environment, Beihang University