Real-Time Object Tracking with On-Device Deep Learning for Adaptive Beamforming in Dynamic Acoustic Environments

📅 2025-11-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenge of real-time, robust localization and directional capture of multiple moving acoustic sources in dynamic acoustic environments, this paper proposes an edge-embedded acoustic tracking system. Methodologically, it integrates monocular depth estimation with binocular stereo vision for 3D spatial source localization, couples a lightweight deep learning-based object tracker, and drives a planar circular MEMS microphone array to perform azimuth–elevation joint 2D adaptive beamforming. The key contribution lies in the first synergistic integration of edge-deployable deep learning, multimodal visual localization, and compact circular-array beam control—significantly reducing computational overhead. Experimental results demonstrate a signal-to-interference ratio improvement of up to 12.6 dB under challenging reverberant and interfering conditions, with millisecond-level system response latency. The system effectively supports real-time audio interaction applications, including remote conferencing, smart speakers, and hearing-assistive devices.

Technology Category

Application Category

📝 Abstract
Advances in object tracking and acoustic beamforming are driving new capabilities in surveillance, human-computer interaction, and robotics. This work presents an embedded system that integrates deep learning-based tracking with beamforming to achieve precise sound source localization and directional audio capture in dynamic environments. The approach combines single-camera depth estimation and stereo vision to enable accurate 3D localization of moving objects. A planar concentric circular microphone array constructed with MEMS microphones provides a compact, energy-efficient platform supporting 2D beam steering across azimuth and elevation. Real-time tracking outputs continuously adapt the array's focus, synchronizing the acoustic response with the target's position. By uniting learned spatial awareness with dynamic steering, the system maintains robust performance in the presence of multiple or moving sources. Experimental evaluation demonstrates significant gains in signal-to-interference ratio, making the design well-suited for teleconferencing, smart home devices, and assistive technologies.
Problem

Research questions and friction points this paper is trying to address.

Achieving precise sound source localization in dynamic acoustic environments
Enabling real-time adaptive beamforming for moving audio targets
Maintaining robust performance with multiple or moving sound sources
Innovation

Methods, ideas, or system contributions that make the work stand out.

On-device deep learning enables real-time object tracking
Combines single-camera depth estimation with stereo vision
Planar concentric microphone array supports 2D beam steering
🔎 Similar Papers
No similar papers found.