Towards Sharper Object Boundaries in Self-Supervised Depth Estimation

📅 2025-09-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing self-supervised monocular depth estimation methods suffer from blurred depth discontinuities at object boundaries, leading to spurious 3D points. To address this, we propose Mixture Depth Distribution (MDD) modeling, representing per-pixel depth as a multimodal distribution—explicitly encoding uncertainty in mixture weights rather than in regression values. We further introduce a variance-aware loss and an uncertainty propagation mechanism that jointly enforce precise modeling of depth discontinuities at boundaries, all without additional annotations. Our method seamlessly integrates into standard self-supervised training pipelines. Evaluated on KITTI and VKITTIv2, MDD improves boundary sharpness by up to 35% over prior work, while significantly enhancing point cloud geometric fidelity compared to state-of-the-art methods. Notably, it is the first approach to achieve end-to-end optimizable, sharp depth discontinuity estimation without requiring fine-grained boundary annotations.

Technology Category

Application Category

📝 Abstract
Accurate monocular depth estimation is crucial for 3D scene understanding, but existing methods often blur depth at object boundaries, introducing spurious intermediate 3D points. While achieving sharp edges usually requires very fine-grained supervision, our method produces crisp depth discontinuities using only self-supervision. Specifically, we model per-pixel depth as a mixture distribution, capturing multiple plausible depths and shifting uncertainty from direct regression to the mixture weights. This formulation integrates seamlessly into existing pipelines via variance-aware loss functions and uncertainty propagation. Extensive evaluations on KITTI and VKITTIv2 show that our method achieves up to 35% higher boundary sharpness and improves point cloud quality compared to state-of-the-art baselines.
Problem

Research questions and friction points this paper is trying to address.

Achieving sharp depth boundaries in monocular estimation
Reducing blurred depth at object edges without supervision
Modeling pixel depth as mixture distribution for uncertainty
Innovation

Methods, ideas, or system contributions that make the work stand out.

Mixture distribution modeling per-pixel depth
Variance-aware loss functions integration
Uncertainty propagation in existing pipelines
🔎 Similar Papers
No similar papers found.
A
Aurélien Cécille
Visual Behavior, Lyon, France; LIRIS, INSA Lyon, CNRS, École Centrale de Lyon, Université Lumière Lyon 2, Universite Claude Bernard Lyon 1, Villeurbanne, France
Stefan Duffner
Stefan Duffner
Professor in Computer Science, INSA Lyon, LIRIS, France
Machine LearningNeural NetworksComputer Vision
Franck Davoine
Franck Davoine
CNRS, Lyon, France
Perceptioncomputer visionmachine learningmulti-sensor data fusionartificial intelligence
R
Rémi Agier
Visual Behavior, Lyon, France
T
Thibault Neveu
Visual Behavior, Lyon, France