🤖 AI Summary
This work addresses the challenge of high-resolution motion trajectory estimation from a single motion-blurred image. We propose the first diffusion-based trajectory recovery framework, featuring a multi-scale conditional diffusion architecture that jointly incorporates cross-scale spatial priors and pixel-level connectivity constraints, along with a novel progressive training strategy to ensure consistent modeling of trajectory shape, position, and topology. Unlike conventional low-dimensional motion representations (e.g., uniform linear motion assumptions) or coarse parametric approaches, our method directly generates dense trajectory fields at sub-pixel accuracy. Evaluated on blind deblurring and coded exposure reconstruction tasks, it achieves state-of-the-art performance: trajectory estimation error is reduced by 23.6%, and reconstruction PSNR improves by 1.8 dB over prior methods—demonstrating superior capability in modeling complex non-rigid motion.
📝 Abstract
Accurate estimation of motion information is crucial in diverse computational imaging and computer vision applications. Researchers have investigated various methods to extract motion information from a single blurred image, including blur kernels and optical flow. However, existing motion representations are often of low quality, i.e., coarse-grained and inaccurate. In this paper, we propose the first high-resolution (HR) Motion Trajectory estimation framework using Diffusion models (MoTDiff). Different from existing motion representations, we aim to estimate an HR motion trajectory with high-quality from a single motion-blurred image. The proposed MoTDiff consists of two key components: 1) a new conditional diffusion framework that uses multi-scale feature maps extracted from a single blurred image as a condition, and 2) a new training method that can promote precise identification of a fine-grained motion trajectory, consistent estimation of overall shape and position of a motion path, and pixel connectivity along a motion trajectory. Our experiments demonstrate that the proposed MoTDiff can outperform state-of-the-art methods in both blind image deblurring and coded exposure photography applications.