Doctoral Thesis: Geometric Deep Learning For Camera Pose Prediction, Registration, Depth Estimation, and 3D Reconstruction

📅 2025-09-01

📈 Citations: 0

✨ Influential: 0

career value

205K/year

🤖 AI Summary

To address the challenges of high-dimensional sparsity and scarce annotations in 3D deep learning, as well as the coarse geometric representation and poor semantic compatibility of traditional SfM/SLAM in unstructured scenes, this paper proposes a novel 3D representation framework that tightly integrates geometric priors with deep learning. Methodologically, it incorporates explicit geometric priors—depth, surface normals, and equivariance constraints—into an equivariant neural network, and jointly optimizes a geometrically consistent loss function under self-supervised or weakly supervised paradigms using structured-light, SfM, and SLAM observations. The contributions include: (i) significantly improved accuracy and robustness of camera pose estimation, point cloud registration, depth prediction, and 3D reconstruction under challenging conditions such as textureless regions and dynamic occlusions; and (ii) generation of dense 3D representations that simultaneously ensure geometric fidelity and semantic interpretability, effectively supporting downstream applications including digital cultural heritage preservation and VR/AR.

Technology Category

Application Category

📝 Abstract

Modern deep learning developments create new opportunities for 3D mapping technology, scene reconstruction pipelines, and virtual reality development. Despite advances in 3D deep learning technology, direct training of deep learning models on 3D data faces challenges due to the high dimensionality inherent in 3D data and the scarcity of labeled datasets. Structure-from-motion (SfM) and Simultaneous Localization and Mapping (SLAM) exhibit robust performance when applied to structured indoor environments but often struggle with ambiguous features in unstructured environments. These techniques often struggle to generate detailed geometric representations effective for downstream tasks such as rendering and semantic analysis. Current limitations require the development of 3D representation methods that combine traditional geometric techniques with deep learning capabilities to generate robust geometry-aware deep learning models. The dissertation provides solutions to the fundamental challenges in 3D vision by developing geometric deep learning methods tailored for essential tasks such as camera pose estimation, point cloud registration, depth prediction, and 3D reconstruction. The integration of geometric priors or constraints, such as including depth information, surface normals, and equivariance into deep learning models, enhances both the accuracy and robustness of geometric representations. This study systematically investigates key components of 3D vision, including camera pose estimation, point cloud registration, depth estimation, and high-fidelity 3D reconstruction, demonstrating their effectiveness across real-world applications such as digital cultural heritage preservation and immersive VR/AR environments.

Problem

Research questions and friction points this paper is trying to address.

Addressing 3D data challenges with geometric deep learning integration

Overcoming limitations of traditional methods in unstructured environments

Developing robust geometry-aware models for camera pose and reconstruction

Innovation

Methods, ideas, or system contributions that make the work stand out.

Geometric deep learning for 3D vision tasks

Integrating depth and surface normals into models

Combining geometric priors with deep learning capabilities

🔎 Similar Papers

Geometric Constraints in Deep Learning Frameworks: A Survey