SAMatcher: Co-Visibility Modeling with Segment Anything for Robust Feature Matching

📅 2026-06-02

📈 Citations: 0

✨ Influential: 0

career value

213K/year

🤖 AI Summary

This work addresses the lack of explicit modeling of co-visible regions in image correspondence estimation under large viewpoint and scale variations by proposing a structured feature matching method grounded in co-visibility modeling. It extends the Segment Anything Model (SAM) to multi-view correspondence inference for the first time, leveraging predicted cross-view co-visible masks and bounding boxes as structured priors. A symmetric cross-view interaction mechanism is introduced to enable bidirectional feature exchange and semantic alignment. By integrating mask–box consistency constraints with a unified supervision strategy, the approach shifts the matching paradigm from pixel-level to region-level. The method achieves significant performance gains over existing techniques across multiple challenging benchmarks, demonstrating notably enhanced robustness under extreme viewpoint and scale changes.

📝 Abstract

Reliable correspondence estimation is a fundamental problem in image processing, underpinning applications such as Structure from Motion, visual localization, and image registration. Existing learning-based methods have significantly improved local feature representations, yet most still operate at the pixel or patch level and lack explicit modeling of regions that are jointly visible across views. We propose SAMatcher, a feature matching framework that formulates correspondence estimation through co-visibility modeling. Instead of directly matching local features, SAMatcher first predicts co-visible region masks and bounding boxes as structured priors for correspondence estimation. Built upon the Segment Anything Model (SAM), it introduces a symmetric cross-view interaction mechanism that enables bidirectional feature exchange and cross-view semantic alignment. We further develop a unified supervision scheme that jointly optimizes mask prediction and box localization through mask learning, box regression, and mask-box consistency constraints. Extensive experiments on challenging benchmarks demonstrate substantial improvements over existing matching pipelines, particularly under large viewpoint and scale variations. Our results show that foundation models originally designed for monocular segmentation can be effectively extended to multi-view correspondence reasoning through explicit co-visibility modeling, offering a new perspective on structured representation learning for image matching. Code and project page: https://xupan.top/Projects/samatcher

Problem

Research questions and friction points this paper is trying to address.

feature matching

co-visibility modeling

correspondence estimation

multi-view

structured representation

Innovation

Methods, ideas, or system contributions that make the work stand out.

co-visibility modeling

feature matching

Segment Anything Model