🤖 AI Summary
Existing image deblurring datasets suffer from limited scale and scene diversity, failing to capture realistic, complex blur patterns and thereby constraining model generalization. To address this, we introduce the first large-scale, real-world deblurring dataset built from smartphone-captured high-speed video (240 fps): long-exposure blur is synthesized via multi-frame averaging, with the temporal central frame serving as the sharp ground truth. The dataset encompasses diverse indoor/outdoor scenes and motion patterns, achieving a scale ten times larger and scene diversity eight times greater than prevailing benchmarks. It supports end-to-end deep learning training and evaluation. Extensive experiments on multiple state-of-the-art models demonstrate its significantly higher difficulty—average performance drops by over 25%—revealing critical limitations of current methods in realistic settings. This work establishes a new benchmark and standardized testbed for robust, generalizable image deblurring research.
📝 Abstract
We introduce the largest real-world image deblurring dataset constructed from smartphone slow-motion videos. Using 240 frames captured over one second, we simulate realistic long-exposure blur by averaging frames to produce blurry images, while using the temporally centered frame as the sharp reference. Our dataset contains over 42,000 high-resolution blur-sharp image pairs, making it approximately 10 times larger than widely used datasets, with 8 times the amount of different scenes, including indoor and outdoor environments, with varying object and camera motions. We benchmark multiple state-of-the-art (SOTA) deblurring models on our dataset and observe significant performance degradation, highlighting the complexity and diversity of our benchmark. Our dataset serves as a challenging new benchmark to facilitate robust and generalizable deblurring models.