Handling Multiple Hypotheses in Coarse-to-Fine Dense Image Matching

📅 2025-09-10

📈 Citations: 0

✨ Influential: 0

career value

208K/year

🤖 AI Summary

In dense image matching, the single-correspondence assumption commonly fails under challenging scenarios such as depth discontinuities and large-scale variations. To address this, this paper proposes the first multi-hypothesis modeling framework tailored for coarse-to-fine matching. Our method generates multiple correspondence hypotheses per source pixel at each scale, employs a beam-search mechanism to propagate and prune high-confidence hypotheses across scales, and introduces a cross-attention module that integrates multiple hypotheses to enable inter-hypothesis information exchange and dynamic fusion. Built upon a Transformer-based multi-scale architecture, our approach significantly improves matching robustness and accuracy in complex scenes. Extensive experiments demonstrate state-of-the-art performance on standard benchmarks—particularly excelling under severe depth discontinuities and strong scale changes—outperforming all existing methods.

Technology Category

Application Category

📝 Abstract

Dense image matching aims to find a correspondent for every pixel of a source image in a partially overlapping target image. State-of-the-art methods typically rely on a coarse-to-fine mechanism where a single correspondent hypothesis is produced per source location at each scale. In challenging cases -- such as at depth discontinuities or when the target image is a strong zoom-in of the source image -- the correspondents of neighboring source locations are often widely spread and predicting a single correspondent hypothesis per source location at each scale may lead to erroneous matches. In this paper, we investigate the idea of predicting multiple correspondent hypotheses per source location at each scale instead. We consider a beam search strategy to propagat multiple hypotheses at each scale and propose integrating these multiple hypotheses into cross-attention layers, resulting in a novel dense matching architecture called BEAMER. BEAMER learns to preserve and propagate multiple hypotheses across scales, making it significantly more robust than state-of-the-art methods, especially at depth discontinuities or when the target image is a strong zoom-in of the source image.

Problem

Research questions and friction points this paper is trying to address.

Handling multiple hypotheses in dense image matching

Improving robustness at depth discontinuities and zoom-ins

Propagating hypotheses across scales with beam search

Innovation

Methods, ideas, or system contributions that make the work stand out.

Beam search propagates multiple hypotheses per scale

Integrates multiple hypotheses into cross-attention layers

Learns to preserve hypotheses across coarse-to-fine scales

🔎 Similar Papers

Searching from Area to Point: A Hierarchical Framework for Semantic-Geometric Combined Feature Matching