🤖 AI Summary
Traditional interest point detection and matching rely on explicit descriptors, incurring substantial memory overhead and computational cost. This paper proposes an end-to-end descriptor-free keypoint detection framework that implicitly models cross-image keypoint correspondences during detection via a feature pyramid and consistency constraints—thereby eliminating descriptor computation, storage, and explicit matching entirely. Built upon the SuperPoint architecture, our method introduces a self-supervised implicit matching strategy to jointly optimize detection and matching. Evaluated on standard benchmarks including HPatches, the approach achieves matching accuracy competitive with state-of-the-art descriptor-based methods (e.g., SuperPoint+SuperGlue), while reducing memory consumption by approximately 40–60%. This significant efficiency gain enhances both runtime performance and deployment feasibility for visual localization systems.
📝 Abstract
The extraction and matching of interest points are fundamental to many geometric computer vision tasks. Traditionally, matching is performed by assigning descriptors to interest points and identifying correspondences based on descriptor similarity. This work introduces a technique where interest points are inherently associated during detection, eliminating the need for computing, storing, transmitting, or matching descriptors. Although the matching accuracy is marginally lower than that of conventional approaches, our method completely eliminates the need for descriptors, leading to a drastic reduction in memory usage for localization systems. We assess its effectiveness by comparing it against both classical handcrafted methods and modern learned approaches.