🤖 AI Summary
Event cameras suffer from low spatial resolution, hindering fine-grained visual perception. To address this, we propose an ultra-lightweight spiking neural network (SNN) framework for real-time event-stream super-resolution reconstruction. Methodologically, we introduce a novel dual-path forward polarity-separation encoding scheme to decouple positive and negative event processing; further, we design a learnable spatio-temporal polarity-aware loss function integrated with an uncertainty-weighting mechanism to jointly optimize temporal consistency, spatial fidelity, and polarity alignment. The model contains fewer than 50K parameters and achieves inference latency under 2 ms. It attains state-of-the-art performance across multiple benchmark datasets. Compared to existing approaches, our method reduces model size by 87% and accelerates inference by 3.2×, enabling efficient embedded deployment. As a compact front-end module, it effectively enhances downstream vision tasks.
📝 Abstract
Event cameras offer unparalleled advantages such as high temporal resolution, low latency, and high dynamic range. However, their limited spatial resolution poses challenges for fine-grained perception tasks. In this work, we propose an ultra-lightweight, stream-based event-to-event super-resolution method based on Spiking Neural Networks (SNNs), designed for real-time deployment on resource-constrained devices. To further reduce model size, we introduce a novel Dual-Forward Polarity-Split Event Encoding strategy that decouples positive and negative events into separate forward paths through a shared SNN. Furthermore, we propose a Learnable Spatio-temporal Polarity-aware Loss (LearnSTPLoss) that adaptively balances temporal, spatial, and polarity consistency using learnable uncertainty-based weights. Experimental results demonstrate that our method achieves competitive super-resolution performance on multiple datasets while significantly reducing model size and inference time. The lightweight design enables embedding the module into event cameras or using it as an efficient front-end preprocessing for downstream vision tasks.