EpiMask: Leveraging Epipolar Distance Based Masks in Cross-Attention for Satellite Image Matching

📅 2026-03-22

📈 Citations: 0

✨ Influential: 0

career value

229K/year

🤖 AI Summary

This work addresses the significant performance degradation of deep learning-based matching models—originally trained on ground-level imagery—when applied to satellite images, primarily due to violations of the pinhole camera assumption inherent in satellite line-scan imaging geometry. To overcome this limitation, the authors propose EpiMask, a semi-dense matching network tailored for line-scan satellite imagery that integrates epipolar distance–guided attention masks with satellite imaging geometry to effectively constrain cross-view attention. Additionally, the method incorporates local affine camera modeling and fine-tunes pretrained vision encoders to better capture remote sensing characteristics. Evaluated on the SatDepth dataset, EpiMask achieves up to a 30% improvement in matching accuracy over ground-image models trained from scratch, substantially advancing the state of the art in satellite image matching.

Technology Category

Application Category

📝 Abstract

The deep-learning based image matching networks can now handle significantly larger variations in viewpoints and illuminations while providing matched pairs of pixels with sub-pixel precision. These networks have been trained with ground-based image datasets and, implicitly, their performance is optimized for the pinhole camera geometry. Consequently, you get suboptimal performance when such networks are used to match satellite images since those images are synthesized as a moving satellite camera records one line at a time of the points on the ground. In this paper, we present EpiMask, a semi-dense image matching network for satellite images that (1) Incorporates patch-wise affine approximations to the camera modeling geometry; (2) Uses an epipolar distance-based attention mask to restrict cross-attention to geometrically plausible regions; and (3) That fine-tunes a foundational pretrained image encoder for robust feature extraction. Experiments on the SatDepth dataset demonstrate up to 30% improvement in matching accuracy compared to re-trained ground-based models.

Problem

Research questions and friction points this paper is trying to address.

satellite image matching

epipolar geometry

cross-attention

camera modeling

image matching networks

Innovation

Methods, ideas, or system contributions that make the work stand out.

epipolar distance

cross-attention

satellite image matching