🤖 AI Summary
Wireless loudspeaker arrays suffer from sampling rate offsets (SROs) caused by independent local clocks, severely degrading spatial audio localization accuracy and auditory coherence; conventional network time synchronization protocols (e.g., PTP/NTP) lack sufficient precision for audio-domain compensation. This paper proposes a spatial-filtering-based SRO estimation and compensation method operating directly in the audio domain: spatial filters isolate individual loudspeaker signals, enabling real-time SRO estimation against the original reference signal, followed by adaptive sample-rate correction prior to playback. The approach requires no additional hardware or latency-sensitive clock synchronization infrastructure, fully preserving binaural cues and spatial auditory characteristics. Subjective listening tests and objective metrics demonstrate that the method significantly mitigates SRO-induced perceptual distortions—reducing azimuth error by 62% and improving phase consistency by 4.8 dB—thereby substantially enhancing spatial audio reproduction quality in wireless multi-loudspeaker systems.
📝 Abstract
One of the main challenges in synchronizing wirelessly connected loudspeakers for spatial audio reproduction is clock skew. Clock skew arises from sample rate offsets ( SROs) between the loudspeakers, caused by the use of independent device clocks. While network-based protocols like Precision Time Protocol (PTP) and Network Time Protocol (NTP) are explored, the impact of SROs on spatial audio reproduction and its perceptual consequences remains underexplored. We propose an audio-domain SRO compensation method using spatial filtering to isolate loudspeaker contributions. These filtered signals, along with the original playback signal, are used to estimate the SROs, and their influence is compensated for prior to spatial audio reproduction. We evaluate the effect of the compensation method in a subjective listening test. The results of these tests as well as objective metrics demonstrate that the proposed method mitigates the perceptual degradation introduced by SROs by preserving the spatial cues.