Real-time Multi-view Omnidirectional Depth Estimation for Real Scenarios based on Teacher-Student Learning with Unlabeled Data

📅 2024-09-12

📈 Citations: 0

✨ Influential: 0

career value

215K/year

🤖 AI Summary

To address the challenge of achieving real-time, robust, and cross-scene generalizable 360° omnidirectional depth estimation on edge devices for autonomous driving and robotics, this paper proposes a spherical-geometry-driven lightweight solution. Methodologically, we introduce: (1) a novel semi-supervised framework combining Curvilinear Spherical Scanning (CSS) with teacher–student collaborative learning, leveraging high-accuracy stereo models to generate reliable pseudo-labels for unlabeled real-world data; (2) a lightweight Rt-OmniMVS network jointly optimized with a HexaMODE six-camera fisheye hardware system for end-to-end spherical multi-view depth estimation; and (3) integrated data- and model-level augmentation with edge-specific optimization strategies. Our approach achieves 15 FPS real-time inference on edge hardware, matches state-of-the-art accuracy, and significantly reduces parameter count and computational cost. Extensive evaluation on large-scale indoor–outdoor datasets and real-world complex scenarios demonstrates superior generalization and robustness.

Technology Category

Application Category

📝 Abstract

Omnidirectional depth estimation enables efficient 3D perception over a full 360-degree range. However, in real-world applications such as autonomous driving and robotics, achieving real-time performance and robust cross-scene generalization remains a significant challenge for existing algorithms. In this paper, we propose a real-time omnidirectional depth estimation method for edge computing platforms named Rt-OmniMVS, which introduces the Combined Spherical Sweeping method and implements the lightweight network structure to achieve real-time performance on edge computing platforms. To achieve high accuracy, robustness, and generalization in real-world environments, we introduce a teacher-student learning strategy. We leverage the high-precision stereo matching method as the teacher model to predict pseudo labels for unlabeled real-world data, and utilize data and model augmentation techniques for training to enhance performance of the student model Rt-OmniMVS. We also propose HexaMODE, an omnidirectional depth sensing system based on multi-view fisheye cameras and edge computation device. A large-scale hybrid dataset contains both unlabeled real-world data and synthetic data is collected for model training. Experiments on public datasets demonstrate that proposed method achieves results comparable to state-of-the-art approaches while consuming significantly less resource. The proposed system and algorithm also demonstrate high accuracy in various complex real-world scenarios, both indoors and outdoors, achieving an inference speed of 15 frames per second on edge computing platforms.

Problem

Research questions and friction points this paper is trying to address.

Achieving real-time omnidirectional depth estimation on edge computing platforms

Enhancing cross-scene generalization for real-world applications like autonomous driving

Overcoming limited labeled data through teacher-student learning with unlabeled data

Innovation

Methods, ideas, or system contributions that make the work stand out.

Combined Spherical Sweeping method for real-time performance

Teacher-student learning with pseudo labels from unlabeled data

Lightweight network structure for edge computing platforms

🔎 Similar Papers

A New Dataset for Monocular Depth Estimation Under Viewpoint Shifts