🤖 AI Summary
To address the low-resolution and sparsity of depth maps from lightweight Time-of-Flight (ToF) sensors, this paper proposes a self-supervised RGB-ToF fusion framework that requires no ground-truth depth labels. Methodologically, it introduces a self-supervised consistency loss tailored to ToF sparsity and a scale recovery module; its enhanced variant, SelfToF*, incorporates submanifold sparse convolution and guided multimodal feature fusion to achieve robust depth enhancement across varying sparsity levels. Key innovations include: (i) the first application of submanifold convolution to ToF depth enhancement, (ii) a scale-aware loss, and (iii) an RGB-depth feature alignment mechanism. Evaluated on NYU Depth V2 and ScanNet, the method reduces RMSE by up to 32% over state-of-the-art self-supervised and supervised baselines, while maintaining real-time inference speed. The code will be publicly released.
📝 Abstract
Depth map enhancement using paired high-resolution RGB images offers a cost-effective solution for improving low-resolution depth data from lightweight ToF sensors. Nevertheless, naively adopting a depth estimation pipeline to fuse the two modalities requires groundtruth depth maps for supervision. To address this, we propose a self-supervised learning framework, SelfToF, which generates detailed and scale-aware depth maps. Starting from an image-based self-supervised depth estimation pipeline, we add low-resolution depth as inputs, design a new depth consistency loss, propose a scale-recovery module, and finally obtain a large performance boost. Furthermore, since the ToF signal sparsity varies in real-world applications, we upgrade SelfToF to SelfToF* with submanifold convolution and guided feature fusion. Consequently, SelfToF* maintain robust performance across varying sparsity levels in ToF data. Overall, our proposed method is both efficient and effective, as verified by extensive experiments on the NYU and ScanNet datasets. The code will be made public.