🤖 AI Summary
This work addresses the emerging task of single-frame optical flow estimation, tackling two key limitations of prior approaches: conventional two-frame methods’ reliance on consecutive frames, and existing single-frame methods’ dependence on scarce ground-truth annotations and provision of deterministic outputs only. We propose the first training-free, probabilistic framework for single-image optical flow estimation, introducing a novel “estimate–synthesize” paradigm. It leverages a pre-trained optical flow model (e.g., RAFT) to obtain an initial motion prior, then employs a diffusion model to synthesize diverse plausible future frames, thereby implicitly modeling the underlying motion uncertainty distribution. Motion uncertainty is efficiently characterized via Monte Carlo sampling and distribution aggregation. Our method achieves state-of-the-art performance on both synthetic and real-world benchmarks, surpassing both single-frame and two-frame baselines across accuracy, output diversity, and inference efficiency. It establishes a new paradigm for unsupervised, uncertainty-aware visual motion understanding.
📝 Abstract
This paper studies optical flow estimation, a critical task in motion analysis with applications in autonomous navigation, action recognition, and film production. Traditional optical flow methods require consecutive frames, which are often unavailable due to limitations in data acquisition or real-world scene disruptions. Thus, single-frame optical flow estimation is emerging in the literature. However, existing single-frame approaches suffer from two major limitations: (1) they rely on labeled training data, making them task-specific, and (2) they produce deterministic predictions, failing to capture motion uncertainty. To overcome these challenges, we propose ProbDiffFlow, a training-free framework that estimates optical flow distributions from a single image. Instead of directly predicting motion, ProbDiffFlow follows an estimation-by-synthesis paradigm: it first generates diverse plausible future frames using a diffusion-based model, then estimates motion from these synthesized samples using a pre-trained optical flow model, and finally aggregates the results into a probabilistic flow distribution. This design eliminates the need for task-specific training while capturing multiple plausible motions. Experiments on both synthetic and real-world datasets demonstrate that ProbDiffFlow achieves superior accuracy, diversity, and efficiency, outperforming existing single-image and two-frame baselines.