Teaching Physical Awareness to LLMs through Sounds

📅 2025-06-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) lack fundamental perceptual grounding in the physical world, particularly in reasoning about acoustic phenomena governed by underlying physical principles—e.g., Doppler effect, multipath propagation, and spatial geometric constraints. To address this, we propose ACORN: a physics-informed framework that introduces AQA-PHY, the first audio question-answering dataset generated via a physically grounded sound simulator. ACORN jointly models both magnitude and phase components of audio signals, incorporates a phase-sensitive audio encoder, and integrates physics-aware priors into a multimodal LLM architecture with explicit audio–text alignment. Evaluated on line-of-sight detection, Doppler shift estimation, and direction-of-arrival estimation, ACORN significantly outperforms existing baselines. Our results demonstrate that audition serves as an effective modality for endowing LLMs with foundational physical awareness. This work pioneers the new research direction of *physics-perceptive audio-language modeling*.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) have shown remarkable capabilities in text and multimodal processing, yet they fundamentally lack physical awareness--understanding of real-world physical phenomena. In this work, we present ACORN, a framework that teaches LLMs physical awareness through sound, focusing on fundamental physical phenomena like the Doppler effect, multipath effect, and spatial relationships. To overcome data scarcity, ACORN introduce a physics-based simulator combining real-world sound sources with controlled physical channels to generate diverse training data. Using this simulator, we build AQA-PHY, a comprehensive Audio Question-Answer dataset, and propose an audio encoder that processes both magnitude and phase information. By connecting our audio encoder to state-of-the-art LLMs, we demonstrate reasonable results in both simulated and real-world tasks, such as line-of-sight detection, Doppler effect estimation, and Direction-of-Arrival estimation, paving the way for enabling LLMs to understand physical world.
Problem

Research questions and friction points this paper is trying to address.

Teaching LLMs physical awareness through sound processing
Overcoming data scarcity with physics-based sound simulation
Enhancing LLMs' understanding of real-world physical phenomena
Innovation

Methods, ideas, or system contributions that make the work stand out.

Physics-based simulator for diverse sound data
Audio encoder processing magnitude and phase
Connecting audio encoder to advanced LLMs
🔎 Similar Papers
No similar papers found.
Weiguo Wang
Weiguo Wang
NIO
Multi-modal SensingInternet of thingsMobile Computing
A
Andy Nie
NIO, Peking University
W
Wenrui Zhou
NIO, Peking University
Y
Yi Kai
NIO, Peking University
Chengchen Hu
Chengchen Hu
NIO, Peking University