Pre-training Autoencoder for Acoustic Event Classification via Blinky

📅 2025-09-18

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the challenge of ultra-low-bandwidth (≈0.2% of audio bandwidth) and noise-sensitive video transmission in Blinkies—a sound-to-light conversion system operating at 30 fps. To enable robust edge deployment, we propose a lightweight, unsupervised autoencoder architecture specifically designed for resource-constrained devices such as Raspberry Pi. Our method enhances encoder resilience to channel distortions via latent-space noise injection during pretraining. The autoencoder compresses raw audio into compact, discriminative representations, which are then visually encoded via LED flickering for optical transmission. Evaluated on the ESC-50 dataset under a stringent 15 Hz bandwidth constraint in simulation, our approach achieves a significantly higher macro-F1 score than conventional sound-to-light methods. Results demonstrate substantial improvement in audio event classification accuracy under low-bandwidth, high-noise conditions—enabling practical, real-time acoustic sensing over severely bandwidth-limited optical links.

Technology Category

Application Category

📝 Abstract

In the acoustic event classification (AEC) framework that employs Blinkies, audio signals are converted into LED light emissions and subsequently captured by a single video camera. However, the 30 fps optical transmission channel conveys only about 0.2% of the normal audio bandwidth and is highly susceptible to noise. We propose a novel sound-to-light conversion method that leverages the encoder of a pre-trained autoencoder (AE) to distill compact, discriminative features from the recorded audio. To pre-train the AE, we adopt a noise-robust learning strategy in which artificial noise is injected into the encoder's latent representations during training, thereby enhancing the model's robustness against channel noise. The encoder architecture is specifically designed for the memory footprint of contemporary edge devices such as the Raspberry Pi 4. In a simulation experiment on the ESC-50 dataset under a stringent 15 Hz bandwidth constraint, the proposed method achieved higher macro-F1 scores than conventional sound-to-light conversion approaches.

Problem

Research questions and friction points this paper is trying to address.

Addresses low-bandwidth audio transmission via LED

Enhances noise robustness in optical communication

Optimizes for edge device memory constraints

Innovation

Methods, ideas, or system contributions that make the work stand out.

Pre-trained autoencoder for feature distillation

Noise-robust learning with latent noise injection

Memory-optimized encoder for edge devices

🔎 Similar Papers

No similar papers found.

Authors to Follow