🤖 AI Summary
Real-time, low-latency, and robust detection of emergency vehicle sirens in noisy urban environments remains challenging for resource-constrained edge devices.
Method: We propose a lightweight acoustic event detection system tailored for embedded edge platforms. First, we construct AudioSet-EV—a high-fidelity, structured dataset specifically curated for emergency vehicle siren detection. Second, we design E2PANNs, a hardware-efficient convolutional neural network optimized for low-power inference. Third, we develop a multithreaded inference engine integrating adaptive frame scheduling, probabilistic smoothing, and a finite-state machine–based decision module.
Contribution/Results: On a Raspberry Pi 5, the system achieves end-to-end average latency <300 ms, event-level F1-score of 92.7%, and reduces false trigger rate by 68%. It supports WebSocket-based remote monitoring and scalable deployment in distributed acoustic sensor networks. Experimental results demonstrate the feasibility of leveraging low-cost edge devices collaboratively for real-time emergency vehicle tracking in smart cities.
📝 Abstract
We present a full-stack emergency vehicle (EV) siren detection system designed for real-time deployment on embedded hardware. The proposed approach is based on E2PANNs, a fine-tuned convolutional neural network derived from EPANNs, and optimized for binary sound event detection under urban acoustic conditions. A key contribution is the creation of curated and semantically structured datasets - AudioSet-EV, AudioSet-EV Augmented, and Unified-EV - developed using a custom AudioSet-Tools framework to overcome the low reliability of standard AudioSet annotations. The system is deployed on a Raspberry Pi 5 equipped with a high-fidelity DAC+microphone board, implementing a multithreaded inference engine with adaptive frame sizing, probability smoothing, and a decision-state machine to control false positive activations. A remote WebSocket interface provides real-time monitoring and facilitates live demonstration capabilities. Performance is evaluated using both framewise and event-based metrics across multiple configurations. Results show the system achieves low-latency detection with improved robustness under realistic audio conditions. This work demonstrates the feasibility of deploying IoS-compatible SED solutions that can form distributed acoustic monitoring networks, enabling collaborative emergency vehicle tracking across smart city infrastructures through WebSocket connectivity on low-cost edge devices.