🤖 AI Summary
Low-light video deblurring faces dual degradation challenges—insufficient illumination and motion blur—particularly in nighttime surveillance and autonomous driving. Existing two-stage fusion approaches suffer from limited modeling capacity for jointly addressing these coupled degradations. To this end, this paper proposes the first end-to-end framework for fusing event-camera data with RGB video. Our core innovation lies in a complex-valued neural network-driven spatiotemporal fusion mechanism, integrating complex convolution, complex spatiotemporal alignment GRUs, and a complex-domain spatial-frequency joint learning module. This enables continuous, unified alignment and deep fusion of event streams and RGB frames within a single complex-valued representation space. Extensive experiments demonstrate state-of-the-art performance across multiple benchmarks, achieving average improvements of 1.82 dB in PSNR and 0.023 in SSIM over prior methods. The code is publicly released, ensuring strong reproducibility and practical applicability.
📝 Abstract
Low-light video deblurring poses significant challenges in applications like nighttime surveillance and autonomous driving due to dim lighting and long exposures. While event cameras offer potential solutions with superior low-light sensitivity and high temporal resolution, existing fusion methods typically employ staged strategies, limiting their effectiveness against combined low-light and motion blur degradations. To overcome this, we propose CompEvent, a complex neural network framework enabling holistic full-process fusion of event data and RGB frames for enhanced joint restoration. CompEvent features two core components: 1) Complex Temporal Alignment GRU, which utilizes complex-valued convolutions and processes video and event streams iteratively via GRU to achieve temporal alignment and continuous fusion; and 2) Complex Space-Frequency Learning module, which performs unified complex-valued signal processing in both spatial and frequency domains, facilitating deep fusion through spatial structures and system-level characteristics. By leveraging the holistic representation capability of complex-valued neural networks, CompEvent achieves full-process spatiotemporal fusion, maximizes complementary learning between modalities, and significantly strengthens low-light video deblurring capability. Extensive experiments demonstrate that CompEvent outperforms SOTA methods in addressing this challenging task. The code is available at https://github.com/YuXie1/CompEvent.