🤖 AI Summary
This work addresses the challenges of monocular relative depth estimation in soccer scenes, where data scarcity and modeling complexity hinder performance. It pioneers the transfer of zero-shot metric depth capabilities from large pretrained models to this domain-specific task, enabling high-precision relative depth estimation by directly learning metric depth relationships within the scene. This approach effectively mitigates the limitations imposed by small training datasets. Evaluated on the SoccerNet 2025 Monocular Depth Estimation Challenge, the method achieves an outstanding score of 2.68×10⁻³, demonstrating the feasibility and effectiveness of zero-shot transfer strategies for depth estimation in specialized visual domains.
📝 Abstract
We present our solution to the 2025 SoccerNet Monocular Depth Estimation Competition Challenge. Predicting the relative depth in football scenarios is challenging, especially with only thousands of training samples available. To address this issue, our method leverages the powerful zero-shot capabilities of models pretrained on large-scale datasets to learn metric depth for effective relative depth prediction, achieving a score of $2.68 \times 10^{-3}$ on the challenge set.