🤖 AI Summary
Event-driven video frame interpolation suffers from severe performance degradation on blurry videos and poor generalization from synthetic to real-world data. To address these issues, this paper proposes the first end-to-end joint framework for frame interpolation and instantaneous deblurring. Its core contributions are: (1) a unified formulation of frame interpolation and dynamic deblurring as a single learning task; (2) a temporal-aware bidirectional recurrent network enabling adaptive spatiotemporal fusion of events and RGB frames; (3) HighREV—the first high-resolution real-world event-video paired dataset; and (4) a self-supervised domain adaptation mechanism to mitigate synthetic-domain bias. Extensive experiments demonstrate state-of-the-art performance across joint interpolation-deblurring, single-image deblurring, and frame interpolation tasks, with significantly improved cross-domain generalization. The code and HighREV dataset are publicly released.
📝 Abstract
Effective video frame interpolation hinges on the adept handling of motion in the input scene. Prior work acknowledges asynchronous event information for this, but often overlooks whether motion induces blur in the video, limiting its scope to sharp frame interpolation. We instead propose a unified framework for event-based frame interpolation that performs deblurring ad-hoc and thus works both on sharp and blurry input videos. Our model consists in a bidirectional recurrent network that incorporates the temporal dimension of interpolation and fuses information from the input frames and the events adaptively based on their temporal proximity. To enhance the generalization from synthetic data to real event cameras, we integrate self-supervised framework with the proposed model to enhance the generalization on real-world datasets in the wild. At the dataset level, we introduce a novel real-world high-resolution dataset with events and color videos named HighREV, which provides a challenging evaluation setting for the examined task. Extensive experiments show that our network consistently outperforms previous state-of-the-art methods on frame interpolation, single image deblurring, and the joint task of both. Experiments on domain transfer reveal that self-supervised training effectively mitigates the performance degradation observed when transitioning from synthetic data to real-world data. Code and datasets are available at https://github.com/AHupuJR/REFID.