Stable Single-Pixel Contrastive Learning for Semantic and Geometric Tasks

📅 2025-12-04

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the challenge of jointly modeling semantic and geometric information in pixel-level representation learning, aiming to achieve precise cross-image point correspondence without momentum-based mechanisms. We propose a novel stable contrastive loss that eliminates the conventional momentum teacher-student architecture and enables, for the first time, end-to-end, single-pixel-level joint semantic-geometric representation learning. Our method leverages overcomplete feature maps and pixel-wise contrastive learning, trained self-supervisedly on synthetic 2D/3D environments. Experimental results demonstrate that the learned representations exhibit both strong semantic discriminability and high geometric fidelity, leading to significant improvements in cross-view point matching accuracy. This approach establishes a new paradigm for unsupervised pixel-level alignment, advancing beyond reliance on momentum-based consistency or hand-crafted geometric priors.

Technology Category

Application Category

📝 Abstract

We pilot a family of stable contrastive losses for learning pixel-level representations that jointly capture semantic and geometric information. Our approach maps each pixel of an image to an overcomplete descriptor that is both view-invariant and semantically meaningful. It enables precise point-correspondence across images without requiring momentum-based teacher-student training. Two experiments in synthetic 2D and 3D environments demonstrate the properties of our loss and the resulting overcomplete representations.

Problem

Research questions and friction points this paper is trying to address.

Develop stable contrastive losses for pixel-level representation learning

Create view-invariant and semantically meaningful overcomplete descriptors

Enable precise point-correspondence without momentum-based teacher-student training

Innovation

Methods, ideas, or system contributions that make the work stand out.

Stable contrastive losses for pixel-level representations

Overcomplete descriptors for view-invariant semantics

Point-correspondence without momentum-based teacher training

🔎 Similar Papers

No similar papers found.

Authors to Follow