🤖 AI Summary
This work addresses the significant performance degradation of deep learning-based matching models—originally trained on ground-level imagery—when applied to satellite images, primarily due to violations of the pinhole camera assumption inherent in satellite line-scan imaging geometry. To overcome this limitation, the authors propose EpiMask, a semi-dense matching network tailored for line-scan satellite imagery that integrates epipolar distance–guided attention masks with satellite imaging geometry to effectively constrain cross-view attention. Additionally, the method incorporates local affine camera modeling and fine-tunes pretrained vision encoders to better capture remote sensing characteristics. Evaluated on the SatDepth dataset, EpiMask achieves up to a 30% improvement in matching accuracy over ground-image models trained from scratch, substantially advancing the state of the art in satellite image matching.
📝 Abstract
The deep-learning based image matching networks can now handle significantly larger variations in viewpoints and illuminations while providing matched pairs of pixels with sub-pixel precision. These networks have been trained with ground-based image datasets and, implicitly, their performance is optimized for the pinhole camera geometry. Consequently, you get suboptimal performance when such networks are used to match satellite images since those images are synthesized as a moving satellite camera records one line at a time of the points on the ground. In this paper, we present EpiMask, a semi-dense image matching network for satellite images that (1) Incorporates patch-wise affine approximations to the camera modeling geometry; (2) Uses an epipolar distance-based attention mask to restrict cross-attention to geometrically plausible regions; and (3) That fine-tunes a foundational pretrained image encoder for robust feature extraction. Experiments on the SatDepth dataset demonstrate up to 30% improvement in matching accuracy compared to re-trained ground-based models.