Handling Multiple Hypotheses in Coarse-to-Fine Dense Image Matching

📅 2025-09-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In dense image matching, the single-correspondence assumption commonly fails under challenging scenarios such as depth discontinuities and large-scale variations. To address this, this paper proposes the first multi-hypothesis modeling framework tailored for coarse-to-fine matching. Our method generates multiple correspondence hypotheses per source pixel at each scale, employs a beam-search mechanism to propagate and prune high-confidence hypotheses across scales, and introduces a cross-attention module that integrates multiple hypotheses to enable inter-hypothesis information exchange and dynamic fusion. Built upon a Transformer-based multi-scale architecture, our approach significantly improves matching robustness and accuracy in complex scenes. Extensive experiments demonstrate state-of-the-art performance on standard benchmarks—particularly excelling under severe depth discontinuities and strong scale changes—outperforming all existing methods.

Technology Category

Application Category

📝 Abstract
Dense image matching aims to find a correspondent for every pixel of a source image in a partially overlapping target image. State-of-the-art methods typically rely on a coarse-to-fine mechanism where a single correspondent hypothesis is produced per source location at each scale. In challenging cases -- such as at depth discontinuities or when the target image is a strong zoom-in of the source image -- the correspondents of neighboring source locations are often widely spread and predicting a single correspondent hypothesis per source location at each scale may lead to erroneous matches. In this paper, we investigate the idea of predicting multiple correspondent hypotheses per source location at each scale instead. We consider a beam search strategy to propagat multiple hypotheses at each scale and propose integrating these multiple hypotheses into cross-attention layers, resulting in a novel dense matching architecture called BEAMER. BEAMER learns to preserve and propagate multiple hypotheses across scales, making it significantly more robust than state-of-the-art methods, especially at depth discontinuities or when the target image is a strong zoom-in of the source image.
Problem

Research questions and friction points this paper is trying to address.

Handling multiple hypotheses in dense image matching
Improving robustness at depth discontinuities and zoom-ins
Propagating hypotheses across scales with beam search
Innovation

Methods, ideas, or system contributions that make the work stand out.

Beam search propagates multiple hypotheses per scale
Integrates multiple hypotheses into cross-attention layers
Learns to preserve hypotheses across coarse-to-fine scales
M
Matthieu Vilain
Univ. Bordeaux, CNRS, Bordeaux INP, IMS, UMR 5218, F-33400 Talence, France
Rémi Giraud
Rémi Giraud
Associate Professor - Bordeaux INP / Univ. Bordeaux
Image Processing
Y
Yannick Berthoumieu
Univ. Bordeaux, CNRS, Bordeaux INP, IMS, UMR 5218, F-33400 Talence, France
Guillaume Bourmaud
Guillaume Bourmaud
Bordeaux INP, IMS Laboratory CNRS UMR 5218
Computer VisionDeep Learning3D reconstruction3D localizationSupervised/Unsupervised Learning