🤖 AI Summary
To address the challenge of high-quality continuous video capture under high-speed motion—where conventional RGB cameras suffer from bandwidth and dynamic range limitations—this paper proposes a novel video reconstruction framework jointly driven by a single RGB image and an event stream. Methodologically, it introduces the first synergistic framework integrating continuous long-range motion modeling with neural synthesis, featuring a continuous NeRF-style temporal representation and an event-conditioned generative network; hardware synchronization at millisecond precision is achieved via a custom single-lens beamsplitter enabling concurrent RGB and event acquisition. Key contributions include: (i) the E2D2 benchmark—the first evaluation dataset for extreme-compression video reconstruction; (ii) state-of-the-art reconstruction quality, achieving +3.61 dB PSNR and −33% LPIPS improvement over prior methods; and (iii) superior downstream task performance compared to existing baselines.
📝 Abstract
We present ContinuityCam, a novel approach to generate a continuous video from a single static RGB image and an event camera stream. Conventional cameras struggle with high-speed motion capture due to bandwidth and dynamic range limitations. Event cameras are ideal sensors to solve this problem because they encode compressed change information at high temporal resolution. In this work, we tackle the problem of event-based continuous color video decompression, pairing single static color frames and event data to reconstruct temporally continuous videos. Our approach combines continuous long-range motion modeling with a neural synthesis model, enabling frame prediction at arbitrary times within the events. Our method only requires an initial image, thus increasing the robustness to sudden motions, light changes, minimizing the prediction latency, and decreasing bandwidth usage. We also introduce a novel single-lens beamsplitter setup that acquires aligned images and events, and a novel and challenging Event Extreme Decompression Dataset (E2D2) that tests the method in various lighting and motion profiles. We thoroughly evaluate our method by benchmarking color frame reconstruction, outperforming the baseline methods by 3.61 dB in PSNR and by 33% decrease in LPIPS, as well as showing superior results on two downstream tasks.