🤖 AI Summary
This work addresses the challenge of ultra-low-bandwidth (≈0.2% of audio bandwidth) and noise-sensitive video transmission in Blinkies—a sound-to-light conversion system operating at 30 fps. To enable robust edge deployment, we propose a lightweight, unsupervised autoencoder architecture specifically designed for resource-constrained devices such as Raspberry Pi. Our method enhances encoder resilience to channel distortions via latent-space noise injection during pretraining. The autoencoder compresses raw audio into compact, discriminative representations, which are then visually encoded via LED flickering for optical transmission. Evaluated on the ESC-50 dataset under a stringent 15 Hz bandwidth constraint in simulation, our approach achieves a significantly higher macro-F1 score than conventional sound-to-light methods. Results demonstrate substantial improvement in audio event classification accuracy under low-bandwidth, high-noise conditions—enabling practical, real-time acoustic sensing over severely bandwidth-limited optical links.
📝 Abstract
In the acoustic event classification (AEC) framework that employs Blinkies, audio signals are converted into LED light emissions and subsequently captured by a single video camera. However, the 30 fps optical transmission channel conveys only about 0.2% of the normal audio bandwidth and is highly susceptible to noise. We propose a novel sound-to-light conversion method that leverages the encoder of a pre-trained autoencoder (AE) to distill compact, discriminative features from the recorded audio. To pre-train the AE, we adopt a noise-robust learning strategy in which artificial noise is injected into the encoder's latent representations during training, thereby enhancing the model's robustness against channel noise. The encoder architecture is specifically designed for the memory footprint of contemporary edge devices such as the Raspberry Pi 4. In a simulation experiment on the ESC-50 dataset under a stringent 15 Hz bandwidth constraint, the proposed method achieved higher macro-F1 scores than conventional sound-to-light conversion approaches.