RDD: Robust Feature Detector and Descriptor using Deformable Transformer

📅 2025-05-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the limitations of existing methods in robust feature detection and description under challenging conditions—such as large viewpoint variations and cross-altitude aerial imaging—this paper proposes the first end-to-end keypoint detection and description framework based on deformable Transformers. It innovatively employs deformable self-attention to jointly localize keypoints and generate descriptors, implicitly encoding geometric invariance. We introduce the first real-world air-to-ground aerial photogrammetry dataset and design two novel benchmarking protocols. By integrating multi-scale feature fusion and joint training on air-to-ground and MegaDepth datasets, our method achieves state-of-the-art performance across sparse matching tasks, supports semi-dense matching, and significantly outperforms prior approaches on newly established benchmarks involving large viewpoint/scale changes and cross-altitude 3D reconstruction.

Technology Category

Application Category

📝 Abstract
As a core step in structure-from-motion and SLAM, robust feature detection and description under challenging scenarios such as significant viewpoint changes remain unresolved despite their ubiquity. While recent works have identified the importance of local features in modeling geometric transformations, these methods fail to learn the visual cues present in long-range relationships. We present Robust Deformable Detector (RDD), a novel and robust keypoint detector/descriptor leveraging the deformable transformer, which captures global context and geometric invariance through deformable self-attention mechanisms. Specifically, we observed that deformable attention focuses on key locations, effectively reducing the search space complexity and modeling the geometric invariance. Furthermore, we collected an Air-to-Ground dataset for training in addition to the standard MegaDepth dataset. Our proposed method outperforms all state-of-the-art keypoint detection/description methods in sparse matching tasks and is also capable of semi-dense matching. To ensure comprehensive evaluation, we introduce two challenging benchmarks: one emphasizing large viewpoint and scale variations, and the other being an Air-to-Ground benchmark -- an evaluation setting that has recently gaining popularity for 3D reconstruction across different altitudes.
Problem

Research questions and friction points this paper is trying to address.

Robust feature detection under significant viewpoint changes
Learning visual cues from long-range relationships
Improving geometric invariance in keypoint detection
Innovation

Methods, ideas, or system contributions that make the work stand out.

Deformable transformer captures global context
Deformable attention reduces search complexity
Air-to-Ground dataset enhances training diversity
🔎 Similar Papers
No similar papers found.